[01]
>>
Lloyd Allison,
School of Computer Science and Software Engineering,
Monash University,
Australia 3800.
Abstract:
The functional programming language, Haskell,
is applied to the area of
artificial-intelligence/ data-mining/ inductive-inference/ machine-learning/
(whatever). Haskell types and type-classes are defined
to specify various kinds of, for want of a name,
statistical model as used in that area.
Basic models (distributions) are like values,
function-models (regressions) are like functions, and
time-series are a little like lists.
Convenient operators and conversion functions
allow new models to be built out of building blocks
(-- in the usual functional programming style).
Haskell makes a good tool to examine formally the types and behaviour
of models.
The result is a sort of theory of programming with models
(or a rapid prototype for a data analysis system),
which of course also compiles and runs.
Some (simple) algorithms for classic machine learning problems are
given to support a general claim to being realistic.
Presented to
Dept. of Computer Science and Software Engineering
(CSSE),
111 Barry St.,
Melbourne University,
Victoria, Australia,
21 October 2003.
This
document can be found at
users.monash.edu.au/~lloyd/Seminars/20031021-MU/index.shtml
and includes hyper-links to other resources.
<<
[02]
>>
Groundhog days . . .
- repeat
- New, bright student begins PhD in machine-learning
(often using
MML
under CSW or associate),
-
- devises a new "statistical model" for analysing data from some problem,
- does some
difficult Maths
on estimator(s) for the model,
-
- implements
estimator in C (usual at Monash) or similar,
reinvents data input
(slightly new format of course) &
display of results (ditto),
-
- resulting in a program that is just robust enough to run enough
tests and comparisons
for 2 or 3 papers and the thesis,
-
- gets PhD and departs
- until ???; ...
<<
[03]
>>
- ... we are left with many programs that
-
- have little documentation,
-
- are not robust in extreme cases,
-
- cannot easily cooperate with each other or
be used to build new models, and
-
- are not easy to generalize.
(NB. Not the students' fault, and
there are exceptions.)
<<
[04]
>>
questions arising
-
- What are the products of machine-learning research?
(of AI, data mining, computational statistics,...)
-
- How [ do | could ] they behave?
-
- What can you do to them?
-
- How can two or more of them be combined?
-
-
- What are their types and classes?
-
-
(c.f. S-Plus, R, Weka, etc..)
<<
[05]
>>
Inspiration (coals to Newcastle slide)
- Functional Programming i.e. programming with functions
-
- lambda-calculus, Lisp, . . ., SML, and Haskell-98
-
- high-order functions (e.g. map, fold, zip, . . . etc.)
-
- lazy-evaluation (e.g. ints = 0 : (map ((+) 1) ints))
-
- polymorphic types
(e.g. map :: (t->u) -> [t] -> [u])
-
- automatic type-inference
-
- type-classes
-
- type checked & compiled (fast)
-
- So, trying to create ``programming with (statistical) models''
here (using Haskell-98)...
<<
[06]
>>
... programming with models
- we get:
-
- high-order (functions of) models
- e.g. estimate Classification-Tree parameterized by (estimate Leaf Model)
- e.g. Classification-Tree -> Regression-Tree
-
- polymorphic typed (& type-checked) models
- e.g. a Model of some Bounded Enum <data-space>, say
- e.g. FunctionModel of <in-space> to <op-space>
-
- classes of statistical model
- e.g. (basic) Model, FunctionModel, TimeSeries, . . ., ¿other?
-
- (and lazy evaluation is useful in some algorithms)
NB. Some slight abuse of the type system
above and following.
<<
[07]
>>
- class Model mdl where
- pr ::
(mdl dataSpace) -> dataSpace -> Probability
- nlPr :: (mdl dataSpace) -> dataSpace -> MessageLength
- msg :: SuperModel (mdl dataSpace) =>
- (mdl dataSpace) -> [dataSpace] -> MessageLength
- msg2 :: (mdl dataSpace) -> [dataSpace] -> MessageLength
- . . .
-
- e.g.
- data ModelType dataSpace =
- MPr
MessageLength (dataSpace -> Probability) ... |
- MnlPr MessageLength (dataSpace -> MessageLength) ...
-
- instance Model ModelType where
- nlPr (MPr . . . ) = . . .
- nlPr (MnlPr . . . ) = . . .
- . . .
(Explain a little about
MML,
information, over-fitting and stopping criteria.)
<<
[08]
>>
e.g.
wallaceIntModel =
let catalans =
let cats last n =
. . .
. . .
in 1 : (cats 1 1)
cumulativeCatalans
= scanl1 (+) catalans
find n posn (e:es) = . . .
in MnlPr 0 (\n -> ((find . . .
|
n | code | pr |
0 1 2 3 4 ...
|
0 100 10100 11000 1010100 . . .
|
1/2 1/8 1/32 . . .
|
|
-- a (universal) Model of non-negative Ints
<<
[09]
>>
example operators on Models
- bivariate (m1, m2) =
- let
- negLogPr (d1, d2)
= (nlPr m1 d1) + (nlPr m2 d2)
- in MnlPr ((msg1 m1)+(msg1 m2))
negLogPr
. . .
-- similarly bivariate estimators etc.
- mixture ::
(Mixture mx, SuperModel (mx sMdl)) =>
- mx sMdl -> sMdl
-- a weighted average of Models, and
also of Time-Series and of Function-Models
(i.e. of SuperModels)
<<
[10]
>>
examples
- estMultiState :: ... => [t] -> ModelType t
-- estimator for Enum Bounded t
-
- estBivariate estMultiState estMultiState
-- estimator of two coins, say
-
- instance
(Enum t1,Bounded t1, Enum t2,Bounded t2)
=>
Enum (t1,t2)
where...
-- allows *MultiState to apply directly to pairs
-- e.g. of coins,
NB. not equiv' to previous example.
<<
[11]
>>
Mixture Modelling (clustering)
- estMixture ests dataSet = let
- memberships (Mix . . . ) =
-   . . .
- fitMixture mems =
-   . . .
- in mixture( cycles . . . (fitMixture randomMemberships) )
-
- estMixture ::
- [ [dataSpace] -> [Float] -> Model of dataSpace ]
-- wtd estimators
- -> [ dataSpace ]
-- training data
- -> (Mixture) Model of dataSpace
unsupervised classification, clustering,
yes it works (slight abuse of types above).
NB. Mixture modelling chosen as an example `only'
because it is important in AI, ML, DM, CSc.
<<
[12]
>>
Function Models
- class FunctionModel fm where
- condModel :: (fm inSpace opSpace)
-> inSpace
-> ModelType opSpace
- condPr :: (fm inSpace opSpace)
-> inSpace
-> opSpace
-> Probability
- . . .
-
- data FunctionModelType inSpace opSpace =
FM MessageLength (inSpace -> ModelType opSpace)
. . .
-
- instance . . .
-- regressions, classification trees, etc..
<<
[13]
>>
e.g. Classification Tress
- data CTreeType
inSpace opSpace =
- CTleaf (ModelType opSpace) |
- CTfork MessageLength
(Splitter inSpace)
[CTreeType inSpace opSpace]
| . . . --
& maybe more
-
- instance FunctionModel CTreeType where
- condModel (CTleaf leafModel) i
= leafModel
- condModel (CTfork fnLen f dts) i
= condModel (dts !! (applySplitter f i)) i
<<
[14]
>>
- estCTree estLeafMdl splits* ipSet opSet =
- let
- search ipSet opSet =
- let
- leaf = CTleaf leafMdl
-- simplest tree
- leafMdl = estLeafMdl opSet
- leafMsg = msg (functionModel2model leaf) (zip ipSet opSet)+
-
- alternatives . . . =
-- more complex trees
- . . .
-
- in case alternatives (splits ipSet) . . . leaf . . . in
- . . . return the best one . . .
-
- in search ipSet opSet
* One can
also define an interesting type-class, Splits, to give
the partitioning functions implicitly.
+ Some lazy programming!
<<
[15]
>>
Time-Series
- class TimeSeries tsm where
- predictors :: (tsm dataSpace)
-> [dataSpace]
-> [ModelType dataSpace]
- prs :: (tsm dataSpace)
-> [dataSpace]
-> [Probability]
- . . .
-
- data TimeSeriesType dataSpace
= TSM MessageLength
([dataSpace] -> ModelType dataSpace)
. . .
-
- instance . . .
e.g. Markov models, etc.
<<
[16]
>>
Conversions ~ generality
- e.g. assuming the input-space data, i,
are "common knowledge" . . .
- functionModel2model fm =
MnlPr (msg1 fm)
(\(i, o) -> condNlPr fm i o)
-
- functionModel2model fm :: (FunctionModel fm) =>
- fm ipSpace opSpace ->
ModelType (ipSpace, opSpace)
-
- similarly . . .
- estFunctionModel2estModel ::
- ( [ipSpace] -> [opSpace]
-> FunctionModel of ipSpace opSpace)
- -> ( [ (ipSpace, opSpace) ]
-> ModelType (ipSpace, opSpace))
e.g. This can be used to turn a classification-tree into a
FunctionModel- (regression-) tree.
(There are more conversion functions.)
<<
[17]
Conclusion
- Haskell is very good (types & classes, high-order, lazy)
for M.L.,
- maybe it can serve both programing and
"scripting" purposes ,
- beginning to understand programming with models
- (it is clear that "compositional" criteria other than
MML
could be substituted to deal with hypothesis- (model-) complexity).
- But
- some "tension" between wish for static v. dynamic types,
e.g. re I/O,
and
- some "natural" uses seem to require multi-parameter type-classes,
e.g. would like (m1,m2) to "be" a Pair of Models and also
a Model of Pairs.
(c.f.
[functions].)