A programming paradigm for machine learning with a case study of Bayesian networks

Lloyd Allison, Faculty of I.T., Monash University
ACSC2006, January 2006

Inductive programming (IP) is about components for machine learning -- studying, generalizing, defining, building, transforming and using them.

It employs Haskell and MML.

Here IP is illustrated by a case study of a learner for mixed Bayesian networks[*] (discrete and continuous variables) which is applied to a lost-person data-set.

Online at: users.monash.edu.au/~lloyd/Seminars/2006-ACSC/index.shtml 1/2006.

generalized mixed Bayesian network for knowledge discovery over discrete and continuous variables in data mining of search and rescue data

Lost person application of IP case study of BNs

Some variable types...

Age --continuous
Gender --unordered discrete (Bounded, Enum)
Topography = Mountains | Piedmont | Tidewater
--i.e. Bounded, Enum and Ord.

Network+data 9,700 nits v. null+data 10,400 nits approx. (parameterized).


@0:  Tipe       = Alzheimers | Child |...
@1:  Age
@2:  Race       = ...
@3:  Gender     = ...
@4:  Topography = as above
@5:  Urban      = ...
@6:  HrsNt
@7:  DistIPP
@8:  TrackOffset
@9:  Health     = Well | Hurt | Dead --after!
@10: Outcome    = Find | Suspended | Invalid
@11: FindRes    = Ground | Air |...| Dog
@12: FindLoc    = Brush | Woods |...
@13: HrsFind
@14: HrsTot

Components used in IP case study of Bayes-nets

Bayesian network
DAG, nodebn, parents -> child
Classification tree
leaf nodect split nodect
Model missing data,
Split missing data
Model discrete,
e.g. multinomial
Model continuous,
e.g. Normal
Bounded Enum
h |

Components must be autonomous or must cooperate, e.g. to defeat over- (& under-) fitting, even as "height", h, increases.


For data D and model M,

Bayes: pr(M&D) = pr(M).pr(D|M) = pr(D).pr(M|D)
Shannon: msgLen(E) = -log(pr(E))
Wallace: msgLen(M&D) = msgLen(M)+msgLen(D|M) = msgLen(D)+msgLen(M|D)

Receiver M; D|M Transmitter

(NB. MML is not in general identical to MAP,  e.g. continuous params.)

for completeness, the BN estimator (learner)

estNetwork perm estMV dataSet =
   n = (length . selAll) (estMV [])
   search _  []     = []  --done
   search ps (c:cs) =
    --parents ps, children c:cs
      opFlag  = ints2flags [c] n  --child
      ipFlags = ints2flags ps  n  --parents
      cTree = estCTree            --see JFP 2005
                (estAndSelect estMV opFlag)
                (splitSelect ipFlags)
                dataSet dataSet
     in cTree : (search (c:ps) cs)

   trees  = search [] perm       --network
   msgLen = sum (map msg1 trees) --total msg1
   nlP datum = sum (map(\t -> condNlPr t datum datum) trees)
   MnlPr msgLen nlP              --return a Model
         (\() -> "Net:"++(show trees))

Some new IP machinery for BNs,  class Project

  class Project t where      --as in a Projection type
    select :: [Bool] -> t -> t

Lets us select certain variables (columns), e.g.

  data ModelMV dataSpace = MnlPrMV... --new Model type

  instance Model ModelMV where ...

  instance Project (ModelMV dataSpace) where ...

And also
  class Splits2 ds where
    splitSelect :: [Bool]->[ds]->[Splitter ds]

Model missing data,  Maybe t

  modelMaybe m1 m2 =
      negLogPr (Just x) = nlPr m1 True + nlPr m2 x
      negLogPr Nothing  = nlPr m1 False
      MnlPr (msg1 m1 + msg1 m2) negLogPr
            ...'show' method omitted

-- a useful inductive programming (IP) operator, applicable to any model m2, ...

. . . and estimators

  estModelMaybe estModel dataSet =
      present (Just _) = True
      present Nothing  = False
      m1 = uniformModelOfBool          --[*]
      m2 = estModel (map (\(Just x)->x)
                    (filter present dataSet))
    in modelMaybe m1 m

(This is not the same as coding missing as a 'special value'.)

[*] Or alternatively...

  m1 = estMultiState (map present dataSet)
or even...
  m1 = commonKnowledge
¿Do you want to model 'missingness', or not?

Defining the data-space

data Tipe = Alzheimers| Child| Despondent| ...

type Age  = Double

data Topography = Mountains| Piedmont| Tidewater
        deriving (Ord, Enum, Bounded,...)

type MissingPerson = (Maybe Tipe, Maybe Age, ... )

--standard Haskell-98.

Modelling the variables

e0 = estModelMaybe estMultiState            --Tipe

e1 = estModelMaybe(estNormal 0 90 1 70 0.5) --Age

 ... etc.

estMissingPerson = estVariate15 e0 e1 e2 e3 e4 e5 e6
                      e7 e8 e9 e10 e11 e12 e13 e14

Partitioning the data-space,  class Splits


instance Splits Topography where splits = splitsBE

                    --or alternatively  = splitsOrd
 ... etc.

splitsBE  --for Bounded Enum types,
splitsOrd --for Ord types

Also implemented  'setSplits'  for high-arity Bounded Enum types.

Running the application

dataSet = read (readFile theDataFile)
          :: [MissingPerson]              --input

nw = estNetwork [1,2,3,0,4,5,6,7]
                estMissingPerson dataSet  --model

nullModel = estMissingPerson dataSet      --null




L. Allison. A programming paradigm for machine learning with a case study of Bayesian networks. ACSC2006, January 2006.
L. Allison. Models for machine learning and data mining in functional programming. J. Functional Programming, 15(1), pp.15-32, January 2005 (doi:10.1017/S0956796804005301).
Also see: [II], [Ver.1.0], [Ver.1.1] with more case studies & [*][TR2004/153]; [Seminar], [classification-trees], [missing-data], [Bayes-net] & [mixtures]; [MML].
C. S. Wallace & D. M. Boulton. An information measure for classification. Comp. J., 11(2), pp.185-194, 1968.
C. S. Wallace. Statistical and Inductive Inference by MML, Springer-Verlag, isbn:0-387-23795-X, 2005 (the MML book).
N. Friedman & M. Goldszmidt. Learning Bayesian networks with local structure. UAI'96, pp.252-262, 1996 (classification trees in nodes of Bayes nets).

© L. Allison, School of Computer Science and Software Engineering, FIT, Monash University, Australia 3800.
Created with "vi (Linux & Solaris)",   charset=iso-8859-1