[01]
>>
Or how to turn a ModelType into a SuperModel.
L. Allison,
CSSE, bldg26, #135,
Monash University,
Australia 3168.
2pm, 9 December 2002.
|
Also see:
[ACSC2003]
[HICS2003]
|
"A supermodel is born.
You can make models into good models but a supermodel ...
God made her." -- Eileen Ford.
(But what does she know?)
Abstract: The functional programming language Haskell98,
with its polymorphic types and type-classes,
is used to analyse and define the nature of some
problems and solutions (tools) in machine learning and data mining.
Data types and type-classes for statistical models are developed
that allow models to be manipulated in a precise, type-safe and flexible way.
The statistical models considered include
probability distributions, mixture models, function-models, time-series, and
classification- and function-model-trees.
The aim is to improve ways of designing and programming with models,
not just applying them.
This document can be found at users.monash.edu.au/~lloyd/Seminars/200212-MMLFP/index.shtml
and includes hyper-links to other resources.
<<
[02]
>>
Ask
- What are the products of research in
machine learning and data mining?
- What do they do?
- Who (what) do they associate with?
- How do they behave?
Because
- The chances of a fixed data analysis programs,
even a very general one,
suiting an arbitrary problem well must be small,
-
- so as well as just using models,
we should start programming with them.
-
- Therefore we must examine exactly what they are.
<<
[03]
>>
Many products involve what can be called a
`Statistical Model'.
This term is taken to include all of
- probability distribution,
- model,
- regression,
- model class
(statistics sense of `class'),
- hypothesis,
- theory (not theorem),
-- taken to be equivalent except perhaps for a degree of scale or ambition.
<<
[04]
>>
The following are taken to be equivalent
- estimating the parameter(s) of a model,
- fitting a model to data,
- choosing a model class,
- inferring a hypothesis,
- inferring a theory,
except perhaps for a degree of scale or ambition.
<<
[05]
>>
Historical, Special-Case Example
- procedure Align
- (var S1 :Sequence;
-
- procedure m1(
- var overhead :real;
var d :Distribution;
- var S :Sequence;
posn :integer );
-
- var S2 :Sequence;
-
- procedure m2(
- var overhead :real;
var d :Distribution;
- var S :Sequence;
posn :integer )
- ); . . .
-
- begin
- . . .
- end
|
m1 and m2 are models
(of sequences, i.e. TimeSeries) --
hence models as parameters,
models as first-class values.
In effect, Align computes a model of a pair of sequences.
Align operates on models.
|
L. Allison, D. Powell & T. I. Dix.
Compression and approximate matching.
Computer Journal 42(1), pp1-10, 1999.
<<
[06]
>>
Lesson and Inspiration
- Treat models as first-class values.
- i.e. As functions are treated
in Functional Programming (FP)
- Haskell 98
- Static, parametric, polymorphic types
(as in standard ML, SML).
- Type inference algorithm.
- Type classes.
- Lazy evaluation.
- And then see what happens.
<<
[07]
>>
Models.
MMLFP = MML +
FP
Most important property of a (class of) statistical model
is ``pr'':
- class Model mdl where
- pr
:: (mdl dataSpace) -> dataSpace -> Probability
- msg2
:: (mdl dataSpace) -> dataSpace -> MessageLength
-- (2nd part)
- msg :: . . .
(mdl dataSpace)
-> dataSpace
-> MessageLength
- -- a minimum; maybe
(probably!) a Model can also do other things.
<<
[08]
>>
A ModelType
- data ModelType dataSpace =
- MPr
MessageLength (dataSpace -> Probability)
|
- MMsg
MessageLength (dataSpace -> MessageLength)
-
-
- instance Model ModelType where
-
- pr (...) datum = . . .
-
- etc.
<<
[09]
>>
Some Model Examples
wallaceIntModel |
-- |
Model of non-neg' Int, [0.. ] |
normal m s |
-- |
Model of Float |
freqs2model |
-- |
[Int] -> Model of [0..n-1] |
modelInt2model |
-- |
ds -> (Model of Int) -> (Model of ds)
(ds bounded &discrete)
|
bivariate |
-- |
(Model of d1)
-> (Model of d2)
-> Model of (d1,d2) |
estBivariate |
-- |
([d1] -> Model of d1)
-> ([d2] -> Model of d2)
-> ([(d1,d2)] -> Model of (d1,d2))
|
NB.
Slight abuse of Haskell type notation.
<<
[10]
>>
FunctionModels and TimeSeries
Other classes
of statistical model:
- class FunctionModel fm where
-
- condModel
:: (fm inSpace opSpace) -> inSpace
-> ModelType opSpace
- condPr
:: (fm inSpace opSpace) -> inSpace
-> opSpace -> Probability
- condMsg2
:: (fm inSpace opSpace) -> inSpace
-> opSpace -> MessageLength
-
-
- class TimeSeries tsm where
-
- predictors
:: (tsm dataSpace) -> [dataSpace] -> [ModelType dataSpace]
- prs
:: (tsm dataSpace) -> [dataSpace] -> [Probability]
- msg2s
:: (tsm dataSpace) -> [dataSpace] -> [MessageLength]
More? (Surely!)
<<
[11]
>>
SuperModels
Our statistical models have some common properties;
we need a super-class. Obviously...
- class SuperModel sMdl where
- prior :: sMdl -> Probability
- msg1 :: sMdl -> MessageLength
- mixture
:: (Mixture mx, SuperModel (mx sMdl)) =>
mx sMdl -> sMdl
-
- class Mixture mx where
- mixer :: (SuperModel t) => mx t -> ModelType Int
- components :: (SuperModel t) => mx t -> [t]
-
-
- instance SuperModel (ModelType dataSpace) where
-- (as promised)
- msg1 (MPr mdlLen p) = mdlLen
- . . . etc.
<<
[12]
>>
Conversion Functions
[e.g.]
(There are also corresponding conversions on estimators.)
<<
[13]
>>
Mixture modelling (clustering,
unsupervised classification,
Snob,...)
estMixture ests dataSet = let
memberships (Mix mixer components) = ...
randomMemberships = ...
fit [] [] = []
fit (est:ests) (mem:mems) = (est dataSet mem):(fit ests mems)
fitMixture mems = Mix (freqs2model (map (foldl (+) 0) mems))
(fit ests mems)
cycle mx = fitMixture (memberships mx)
cycles 0 mx = mx
cycles n mx = cycles (n-1) (cycle mx)
in mixture(cycles <n> (fitMixture randomMemberships))
<<
[14]
>>
Classification- (decision-) -trees
(supervised classification, C5,...)
<<
[15]
>>
Tree Estimator / Search
estCTree estLeafMdl splits ipSet opSet = let
search ipSet opSet = let
leaf = CTleaf leafMdl -- simplest tree
leafMdl = estLeafMdl opSet -- NB. any leaf Model
leafMsg = ...
partition arity pFn ipSet opSet = ...
alternatives ... = ...
in case alternatives (splits ipSet) leafMsg leaf [ipSet] [opSet]
of (CTfork ...) -> ... -- search for subtrees, or
(t, _, _) -> t -- single leaf, done.
in search ipSet opSet
<<
[16]
>>
Generality
More than a classification-tree, e.g.
- estFunctionModel2estModel
estFn
ipOpPairs =
- functionModel2model (uncurry estFn (unzip ipOpPairs))
-
- ft = estCTree
(estFunctionModel2estModel estFiniteFunction)
-- e.g.
- splits
- trainingIp trainingOp
-
- -- in effect a FunctionModel-tree, i.e. a regression-tree.
- NB.
Can use other estimators than estFiniteFunction.
-
- (Similarly, FunctionModel-mixtures.)
<<
[17]
Conclusions
A good spring collection
- Models, e.g. probability distributions,
mixtures (unsupervised classification).
- FunctionModels, e.g. curve fitting,
regressions, classification trees (supervised classification),
regression trees.
- TimeSeries, e.g. Markov models.
- Operators and conversion functions on the above.
- General, e.g. estimate a mixture of FunctionModels,
estimate a FunctionModel- (regression-) -tree,
etc..
- Have a model of modelling:
A theory,
usable in its own right,
a rapid-prototype for a data mining platform.