Case Study: Mixture of Markov Models18 December 2003 |
|
A colleague mentioned that he was interested in mixture models over Markov models: Given a set of sequences, S=[s0, s1,...], over a finite alphabet, each sequence is believed to come from a Markov model but it is not known how many different components (Markov models) are involved, nor what their parameters are. The [200309/] Haskell code for inductive inference includes a basic estimator for mixture models (unsupervised classification, intrinsic classification, clustering), so it seemed that it should be easy to create a prototype estimator for mixtures of Markov models. The first requirement is for a
weighted[*] estimator
for a Markov model.
A Markov model of A weighted estimator, estFiniteFunctionWeighted,
for a finite function-model already exists.
It can be used by feeding it the inputs
[s0,0,
s0,1,
...,
s0,|s0|-2,
s1,0,
...]
with the corresponding outputs
[s0,1,
s0,2,
...,
s0,|s0|-1,
s1,1,
...]
and the corresponding weights
[w0,
w0,
...,
w0,
w1,
...]
where wi is the weight of The finite function-model, ff, leads to a predictor function for a finite list function-model, flf. flf is converted into a time-series model by functionModel2timeSeries and thence into a model of sequences by timeSeries2model:
The estimator can be used with estMixture to investigate mixtures of two (or more) components, e.g.
It can be seen that there would be no great difficulty in
considering Markov models of order k=2 or more.
In fact unweighted estimators for
finite list function-models and for Markov models of arbitrary
Note that in the above statistical model, each sequence is
considered to have come from a single Markov model.
This is quite different from a situation where an individual sequence may be
heterogeneous, The exercise suggests that it might be more convenient if it were standard for an estimator for a time-series model to work from a set (list) of data-series, rather than from a single data-series.
|
|
|