filler
^CSE454^ [01] >>

Classification- (Decision) Tree

See L. Allison. Types and Classes of Machine Learning and Data Mining, Twenty-Sixth Australasian Computer Science Conference (ACSC2003) pp207-215, Adelaide, Australia, 4-7 February 2003. Conferences in Research and Practice in Information Technology, Vol.16. Michael Oudshoorn, Ed.

CSE454 2003 : This document is online at   http://www.csse.monash.edu.au/~lloyd/tilde/CSC4/CSE454/   and contains hyper-links to other resources - Lloyd Allison ©.

 
filler
<< [02] >>

Partitioning on input attributes

Discrete attribute:
k-way split where k is the `arity' of the attribute
 
Continuous attribute:
Try median, left-quartile, right-quartile, octiles, ...
NB. depends on current (part of) data set
 
Multivariate:
Merge the splitting options for all attributes.

filler
<< [03] >>
data CTreeType inSpace opSpace =

  CTleaf (ModelType opSpace)   |

  CTforkTrad MessageLength
             (inSpace -> Int)
             [CTreeType inSpace opSpace]

  -- can also consider other non-traditional options

filler
<< [04] >>
instance SuperModel (CTreeType inSpace opSpace) where

 -- NB. For simplicity only, this costs the
 -- structure at 1-bit per node.  This is
 -- only optimal for binary trees.

 msg1 (CTleaf leafModel)
   = 1 + msg1 leafModel

 msg1 (CTforkTrad fnLen f dts)
   = 1 + fnLen + (foldl (+) 0 (map msg1 dts))

Message length of a tree is that of the node plus those of the sub-trees, if any.


filler
<< [05] >>

Make a Classification Tree an instance of a FunctionModel:

instance FunctionModel CTreeType where

  condModel (CTleaf leafModel) i = leafModel

  condModel (CTforkTrad fnLen f dts) i
    = condModel (dts !! (f i)) i

filler
<< [06] >>

estCTree : Estimate a Classification Tree

It is convenient to have an inner, local ``search'' function:

estCTree  estLeafMdl splits  ipSet opSet =
 let  -- NB. estLeafMdl must be able to handle an empty data set

  search ipSet opSet =
   let . . .

filler
<< [07] >>

Simplest possible tree is a single leaf node:

leaf    = CTleaf leafMdl

leafMdl = estLeafMdl opSet

leafMsg = foldl (+)
                (msg1 leaf)                 -- part 1
                (map (msg2 leafMdl) opSet)  -- part 2

filler
<< [08] >>

use a partitioning function, pFn, to partition ipSet and opSet into arity ``parts''

partition  arity pFn  ipSet opSet =
 let
  p  []       []      ipParts opParts
    = (ipParts, opParts)  -- done

  p (ip:ips) (op:ops) ipParts opParts =
   let
    partNum = pFn ip
    allocate 0 (ipP:ipPs) (opP:opPs) =    -- add ip and op to
      (((ip:ipP):ipPs), ((op:opP):opPs))  -- the correct parts
    allocate n (ipP:ipPs) (opP:opPs) =
      (\(ipPs,opPs) ->
        (ipP:ipPs, opP:opPs)) (allocate (n-1) ipPs opPs)

   in uncurry (p ips ops) (allocate partNum ipParts opParts)

 in p ipSet opSet (replicate arity []) (replicate arity [])

filler
<< [09] >>

recursive search for the best (1- or) 2-level tree

alternatives  []
 bestML bestCTree bestIpParts bestOpParts
 = (bestCTree, bestIpParts, bestOpParts)  -- done

alternatives ((arity,pFn):pFns)
 bestML bestCTree bestIpParts bestOpParts
 =
 let
  -- NB. the `1' below is acceptable but not optimal
  theTree  = CTforkTrad 1 pFn leaves   -- 2-level tree

  leaves   = map CTleaf leafMdls

  leafMdls = map estLeafMdl opParts

  (ipParts, opParts) = partition arity pFn ipSet opSet

  msgLen = foldl (+) (msg1 theTree)
    (map (\(leafMdl,opSet) ->
         foldl (+) 0 (map (msg2 leafMdl) opSet))
               (zip leafMdls opParts))
 in
  if msgLen < bestML   -- i.e. an improvement
  then alternatives pFns msgLen theTree   ipParts     opParts   -- new 1
  else alternatives pFns bestML bestCTree bestIpParts bestOpParts -- old

filler
<< [10] >>
 in
  case  alternatives (splits ipSet) leafMsg leaf [ipSet] [opSet]  of

   -- subtrees?
   ((CTforkTrad msgLen pf leaves), ipParts, opParts) ->
     CTforkTrad msgLen pf (zipWith search ipParts opParts);

   -- the single leaf wins?
   (t, _, _) -> t


in search ipSet opSet

-- --------------------------------------9/2002--L.Allison--CSSE--Monash--.au--

filler
<< [11]

Summary

Can estimate (fit, infer, learn) a classification-tree given (i) ways of partitioning the data on input attributes and (ii) an estimator for leaf models.
 
Any kind of leaf model can be used -- discrete, continuous, or multivariate;   even...

... FunctionModel- (Regression-) Trees

Use a conversion function estFunctionModel2estModel:

estFunctionModel2estModel estFn ipOpPairs =
  functionModel2model (uncurry estFn (unzip ipOpPairs))

ft = estCTree (estFunctionModel2estModel estFnMdl)
        splits trainIp trainOp

L. Allison. Types and Classes of Machine Learning and Data Mining, 26th Australasian Computer Science Conference (ACSC2003) pp207-215, Adelaide, Australia, 4-7 February 2003. Conferences in Research and Practice in Information Technology, Vol.16. Michael Oudshoorn, Ed.


© L. Allison, School of Computer Science and Software Engineering, Monash University, Australia 3168.
Created with "vi (IRIX)",   charset=iso-8859-1