# How much Information can one learn about a Binomial Distribution?

The binomial provides a simple example about information,
is important, e.g., consider
{+ve, - ve}, {true, false}, {H, T}, {purine, pyrimidine}, {hydrophilic, hydrophobic}, . . . , and
is a component of many structured models, e.g.,
Markov model, classification tree (etc.), Bayesian net, segmentation (of binary seq. or image), . . .

2-state:
Pr(H) = θ,     0 ≤ θ ≤ 1,
Pr(T) = 1 - θ
Binomial, n trials:
Pr(#H = k)  =  nCk  θk (1 - θ)n-k,   0 ≤ k ≤ n

How much information can one learn about θ? . . .
users.monash.edu.au/~lloyd/Seminars/2009-Binomial/index.shtml
( BTW the question is important, e.g.,

the win/loss record of an AFL team:

L W L L W L W W L W L W

The team changed coach after round 6. Has it paid off?
(i)  6 from 12 (null, any difference is due to chance) v.
(ii) 2 from 6  ++  4 from 6

Including the information of two distributions, (ii), v. just one distribution, (i), guards against overfitting.  )

. . . How much information can one learn about θ? . . .
. . . How much information can one learn about θ?

Jack and Jill (devout Bayesians) design a code-book for binomials.

Then, as so often happens, they drift apart . . .
. . . but they do keep in contact.

Jill sends Jack  (i) n, (ii) C.B. entry, & (iii) data | entry.

In (ii) Jill chooses an entry (theory, estimate) for θ out of the code book.
How would Jack and Jill design an optimal code book?

2n possible sequences of length n, but
wrt θ, only #H matters.
#H ∈ [0, n].

The book's page for a given 'n' corresponds to a partition of [0, n]

 simplest most complex 0 1 2 3 . . . n [ . . . ] . . . [ ][ ][ ][ ][ . . . ][ ]

Generally the extreme possibilities are not optimal.

And they chose a prior for θ   (uniform is simplest option).
There is a polynomial-time algorithm to design the page for 'n':

Complete DAG (!-)

The weight of an edge is the contribution to the expected message-length of the corresponding part of the partition of [0, n].

Seek the shortest path from far left to far right.
Demo:

n=

Observing #H, Jill selects a code book entry (interval [lo, hi], estimate θ) and transmits the entry and the data given that entry.

The amount of information transmitted about θ is   - log( (hi - lo + 1) / (n + 1) ).