# Information and the multinomial probability distribution

How much information is there in a discrete sequence, e.g.,

. . . A C G A C T A C G T A C . . .

DNA = {A, C, G, T},  θ = <pr(A), pr(C), pr(G)>,  pr(T) = 1 - pr(A) - pr(C) - pr(G)

or, e.g.,

H H T H T H H T H ?

coin={H,T},   θ = pr(H) = 1 - pr(T)
users.monash.edu.au/~lloyd/Seminars/2009-Multinomial/index.shtml
. . . is an important question, for example,

is the red segment significantly more CG-rich than the rest?

A T A G C G G C G T C G T A G T A A A C T C C G G C T A C C A C T G G A G T G C T A C A A T G T C C C A A C G C T G C A

i.e., two (or three?) distributions v. one.

There are at least three "obvious" ways to transmit multinomial data . . .
1. Combinatorial Code,

e.g., for binary data

n trials,   0 ≤ #H ≤ n,   n = #H + #T

n+1 possible values for #H   (all equally likely under uniform prior)

requires log( n + 1 ) bits to specify #H, and then

log( nC#H ) = log( n! / (#H! #T!) ) bits to specify the sequence | #H;  so the total

code length = log( (n+1)! / (#H! #T!) ) bits
(see log x!)

e.g.,
 product data H H T H T H H T H   H counter - - - - - - 1 2 3 4 5 6 7 8 9 1 2 3 3 4 4 5 6 6 1 1 1 2 2 3 3 3 4 2 3 4 5 6 7 8 9 10 12 23 14 35 26 47 58 39 610
counters initialised to 1 (cannot be 0)

code length = log( (n+1)! / (#H! #T!) ) bits
3. Estimate-based Code

(i)  state an estimate, est(θ), of θ, to optimum precision ±δ/2 (about  - log(prior(est(θ)) . δ) bits) and

(ii) state data | est(θ)

It turns out that

estmml(θ) is pr(valuei) = (#valuei + 1/2) / (n + M/2), i = 1, ..., M, for M-state data, and

the "penalty", over codes 1 and 2, for stating (i) and (ii), that is for including an opinion, (i), is only a fraction of a bit per parameter.
Summary

Codes 1 and 2 equal, code 3 very slightly longer.

(Non-uniform prior: Pretend there were some previous observations.)

Demo. (click) (http://www.allisons.org/ll/MML/Discrete/2state/#demo)