CSE423 Learning and Prediction 1999

<progress 1998< <plan 1999< ^CSE2304^ >progress 2000>

CSE423 Progress 1999

Thursday 22 July 1999, S15, 3pm-5pm (LA)
Introduction & administrivia. Probability: Bayes' thm, joint probability, dependent & independent events, marginal probability, prior & posterior probability (of hypothesis etc.), likelihood. Information. Entropy (definition of). Examples: coins and dice, fair and biased.
Thursday 29 July 1999, S15, 3pm-5pm (LA)
[Information] inequality, Kullback-Leibler distance (defn) and example on a discrete 4-state distribution.
[Coding] prefix property and others, Kraft inequality and proof, Shannon coding thm (defn), Huffman code algorithm, arithmetic coding (basic idea and properties).
[prac #1]
Thursday 5 August 1999, S15, 3pm-5pm (LA)
2-state and binomial distributions: Three methods of encoding, (i) combinatorial, (ii) adaptive, (iii) inference; (i)=(ii)=(iii)-delta. Accuracy of parameter estimate (inference).
Thursday 12 August 1999, S15, 3pm-5pm (LA)
General form of message length using the Fisher information. (i) One continuous parameter (e.g. Poisson), (ii) multiple continuous parameters, and (iii) application to multi-state distribution; worked through 3-state dist'n (2 params), it's good for the soul!
Distributions and codes for +ve Integers: (i) geometric (related to 2-state), (ii) Poisson, (iii) 1/(n.(n+1)), (iv) log^* and relatives.
Thursday 19 August 1999, S15, 3pm-5pm (DLD)
Fisher for Gaussian (Normal). K-L for multinomial (again), Gaussian and Poisson. Probabilistic footy-tipping. Von Mises circular distribution. Snob: unsupervised classification.
Thursday 26 August 1999, S15, 3pm-5pm (DLD)
Mixtures of normal distributions, definite assignment & bias v. partial assignment (and coding trick). See mixture modelling.
NB. Prac#1 due about now.
Thursday 2 September 1999, S15, 3pm-5pm (DLD)
Decision-trees (aka classification-trees) and graphs: supervised classification (expert systems).
Thursday 9 September 1999, S15, 3pm-5pm (DLD)
Coding continuous attribute split-values in decision (classification) -trees. Leaf nodes, see multinomial distribution.
Start on probabilistic finite-state automata/machines (PFSA/PFSM).
Thursday 16 September 1999, S15, 3pm-5pm
Part 1 (DLD): discussed D-graph and join probabilities in particular, also prac#2 and PFSAs.
See TR32 Wallace & Georgeff
NB. There are some FSA techniques in J.Teuhola & T.Raita, Applications of finite-state model to text compression, Computer Journal 36(7) pp607-614, 1993, that are not MML but that may be useful in the "search".
Part 2 (LA): on Tim Edgoose's extension to Snob to deal with sequences of data when items are not independent. Also paper: T.Edgoose & L.Allison, Minimum Message Length Hidden Markov Modelling, IEEE Data Comp. Conf. pp169-178, Snowbird, March 1998. (See school / department tech' report, TR96/285, for more details.)
Thursday 23 September 1999, S15, 3pm-5pm
Sequence Similarity. Ad-hoc alignment methods: Levenshtein metric, Longest Common Subsequence (LCS, LCSS), 2-D dynamic programming algorithm, minimize some "weights" or maximize some "scores". Most probable alignment, 1-state, 3-state etc. models of sequence mutation / relatedness. Sum of probabilities over all alignments; average can be better (i.e. more appropriate) than optimal! See TR-90/148 (Note relation to sum over all possible class-membership assignments in Markov-Snob, as in last week).
NB. 27 Sept' - 1 Oct' is semester break.
Thursday 7 October 1999, S15, 3pm-5pm
Part 1 (LA): Can compress a single sequence by Lempel-Ziv method, or better sum of prob's of all explanations under LZ. For DNA need to allow change, insert and delete within copied sections ... amounts to alignment of sequence to itself, and is effected by a small probabilistic finite state automata (PFSA). Shows repeat structure in a sequence.
2-sequence alignment methods (23 Sept) usually assume any one seq' is random, i.e. incompressible. Can (TR-98/14) change this by using a model of each seq' to provide prob's of "the next char(s)", take -log ..., gives weights for characters.
Part 2 (DLD): Selection of the order of a polynomial model (csw tech report and murlie+csw paper handed out),
Thursday 14 October 1999, S15, 3pm-5pm
Revision, remember -- joint-, marginal-, conditional-, prior- and posterior-probabilites? Also coding, Kulback-Leibler distance, and 2-state, multi-state & normal distributions. Mixture modelling and clustering, not forgetting fractional assignment. Decision (classification) -trees & -graphs, probabilistic finite-state automata & sequences of symbols, or pairs of sequences (edit distance & MML), and sequence compression. Always considering two-part message, parameter precision, nuisance parameters, bias or not, invariance or not, of MML and of other estimators such as maximum likelihood, maximum posterior probability.