Notes for: Types and Classes of Machine Learning and Data Mining

L. Allison, 26th Australasian Computer Science Conference ACSC2003, pp207-215, Adelaide, 2003

home1 home2
 Bib
 Algorithms
 Bioinfo
 FP
 Logic
 MML
 Prog.Lang
and the
 Book

ACSC2003

On Humphrey

H's essay is a beautiful classic on beauty. It has been scanned for the web -- google-ing the title should find it.


On Haskell

A functional programming (FP) language
functions are first-class values
we want statistical models to be first-class values too,
lazy
can compute with some infinite structures,
polymorphic types
for uniform polymorphism,
type-classes
for overloading (ad-hoc polymorphism),
type inference algorithm (so the programmer rarely states types).
e.g.
map f [] = []   -- base case, empty list
map f (x:xs) = (f x):(map f xs)   -- general case
map :: (t->u) -> [t] -> [u]   -- type inferred automatically
posInts = 1 : (map (\x -> x+1) posInts)   -- [1, 2, 3, ...]
 
See: S. Peyton Jones et al. Report on the Programming Language Haskell 98. 1 Feb 1999, and www.haskell.org

On MML

Minimum Message Length (MML) framework,
use of this framework is ``orthogonal'' to the main points of the paper, but
MML is invariant, consistent & resistant to over-fitting, and
is (very) compatible with composing sub-models to form models.

On other classes

A Model is "like" a value,
 
a FunctionModel is like a function (->)
(e.g. a linear regression),
 
a TimeSeries is like a list ([...])
(e.g. a hidden Markov model).

On estMixture

parameter 1
A list [...] of weighted estimators, one per component;
an estimator takes a data-set, i.e. [dataSpace] and returns a Model.
A weighted estimator takes this and weights (for fractional assignment (unbiased results)), and produces a Model,
i.e. [[dataSpace] -> [Float] -> Model of dataSpace]
-- roughly (Model is a class not a type).
parameter 2
a data-set, i.e. [dataSpace].
Result
a mixture-model of the dataSpace
 
(Types inferred by the compiler in any case.)
 
Algorithm - see paper.

On estCTree

parameter 1
an estimator for a leaf Model
i.e. ( [opSpace] -> Model of opSpace )
parameter 2
a function that produces a list of `ways of partitioning' the input space
i.e. ipSpace -> [ . . . ],
a way of partioning being a function
ipSpace -> [Int]     -- roughly.
parameters 3 and 4
the input and output training data
[ipSpace]  & [opSpace]  respectively.
Result
a CTree (which is a FunctionModel) of ipSpace and opSpace.
 
(Types inferred by the compiler in any case.)
 
Algorithm - see paper.

On generality

estFunctionModel2estModel
takes an (unweighted) estimator for a FunctionModel ip op and
returns an estimator for a Model (ip, op).
i.e. Takes a [ip] -> [op] -> FunctionModel of (ip,op)
returns [ (ip, op) ] -> Model of (ip, op).
 
NB. unzip turns a list of pairs into a pair of lists
(i.e. rearrange training data),
and uncurry f x y = f (x, y) .

The important point is that
estCTree + two_line function = new application area.


Other local links

MML
Haskell
Coding Ockham's Razor, L. Allison, Springer

A Practical Introduction to Denotational Semantics, L. Allison, CUP

Linux
 Ubuntu
free op. sys.
OpenOffice
free office suite
The GIMP
~ free photoshop
Firefox
web browser

MML, FP

© L. Allison   http://www.allisons.org/ll/   (or as otherwise indicated),
Faculty of Information Technology (Clayton), Monash University, Australia 3800 (6/'05 was School of Computer Science and Software Engineering, Fac. Info. Tech., Monash University,
was Department of Computer Science, Fac. Comp. & Info. Tech., '89 was Department of Computer Science, Fac. Sci., '68-'71 was Department of Information Science, Fac. Sci.)
Created with "vi (Linux + Solaris)",  charset=iso-8859-1,  fetched Friday, 29-Mar-2024 08:43:15 AEDT.