Problem: Given a multivariate data-set
attributes | |||||
---|---|---|---|---|---|
@1 | @2 | ... | @k | ||
things | x1 | x11 | x12 | ... | x1k |
x2 | x21 | x22 | ... | x2k | |
... | ... | ... | ... | ... | |
xn | xn1 | xn2 | ... | xnk |
Does the data-set consist of one population or of two or more sub-populations (clusters, classes, components, species)?
Is it well described by a mixture of models, one per sub-population?
users.monash.edu.au/~lloyd/Seminars/2005-II/Mixture/index.shtmlestMixture ests dataSet = let -- [estimator]->[dataSpace] -> model of dataSpace -- i.e. [estimator] -> estimator ...
Takes a list of estimators, one per component of the mixture.
memberships (Mix mixer components) = let -- memberships|Mixture doAll (d:ds) = prepend (doOne d) (doAll ds) -- all data doAll [] = map (\x -> []) components doOne datum = normalise( -- one datum zipWith (\c -> \m -> (pr mixer c)*(pr m datum)) [0..] components) -- pr(c) * pr(datum|c) for class #c = m in doAll dataSet
Given the components of the mixture, find (fit) the fractional membership weights of things (data) in (to) the components.
randomMemberships = let doAll seed [] = map (\_ -> []) ests doAll seed (_:ds) = -- all data let doOne seed [] ans = (seed, normalise ans) doOne seed (_:ests) ans = -- one datum doOne (prng seed) ests ((fromIntegral(1+ seed `mod` 10)) : ans) in let (seed2, forDatum) = doOne seed ests [] in prepend forDatum (doAll seed2 ds) in doAll 4321 dataSet
Allocate initial pseudo-random (prng) fractional membership weights to things (data), not very interesting.
fit [] [] = [] -- Models|memberships fit (est:ests) (mem:mems) = (est dataSet mem) : (fit ests mems) fitMixture mems = Mix (freqs2model (map (foldl (+) 0) mems)) -- weights (fit ests mems) -- components
From the membership weights, calculate mixture-weights of the components and fit components (use the given estimators) to their weighted members.
cycle mx = fitMixture (memberships mx) -- EM step cycles 0 mx = mx cycles n mx = cycles (n-1) (cycle mx) -- n x cycle in mixture( cycles ?? (fitMixture randomMemberships) )
Fit memberships to components; fit components to the memberships. Iterate, either some number of times or until convergence.
-- That's all --