See L. Allison. Types and Classes of Machine Learning and Data Mining, Twenty-Sixth Australasian Computer Science Conference (ACSC2003) pp207-215, Adelaide, Australia, 4-7 February 2003. Conferences in Research and Practice in Information Technology, Vol.16. Michael Oudshoorn, Ed.
CSE454
2003
:
This document is online at
http://www.csse.monash.edu.au/~lloyd/tilde/CSC4/CSE454/
and contains hyper-links to other resources
data CTreeType inSpace opSpace = CTleaf (ModelType opSpace) | CTforkTrad MessageLength (inSpace -> Int) [CTreeType inSpace opSpace] -- can also consider other non-traditional options
instance SuperModel (CTreeType inSpace opSpace) where -- NB. For simplicity only, this costs the -- structure at 1-bit per node. This is -- only optimal for binary trees. msg1 (CTleaf leafModel) = 1 + msg1 leafModel msg1 (CTforkTrad fnLen f dts) = 1 + fnLen + (foldl (+) 0 (map msg1 dts))
Message length of a tree is that of the node plus those of the sub-trees, if any.
Make a Classification Tree an instance of a FunctionModel:
instance FunctionModel CTreeType where condModel (CTleaf leafModel) i = leafModel condModel (CTforkTrad fnLen f dts) i = condModel (dts !! (f i)) i
It is convenient to have an inner, local ``search'' function:
estCTree estLeafMdl splits ipSet opSet = let -- NB. estLeafMdl must be able to handle an empty data set search ipSet opSet = let . . .
Simplest possible tree is a single leaf node:
leaf = CTleaf leafMdl leafMdl = estLeafMdl opSet leafMsg = foldl (+) (msg1 leaf) -- part 1 (map (msg2 leafMdl) opSet) -- part 2
use a partitioning function, pFn, to partition ipSet and opSet into arity ``parts''
partition arity pFn ipSet opSet = let p [] [] ipParts opParts = (ipParts, opParts) -- done p (ip:ips) (op:ops) ipParts opParts = let partNum = pFn ip allocate 0 (ipP:ipPs) (opP:opPs) = -- add ip and op to (((ip:ipP):ipPs), ((op:opP):opPs)) -- the correct parts allocate n (ipP:ipPs) (opP:opPs) = (\(ipPs,opPs) -> (ipP:ipPs, opP:opPs)) (allocate (n-1) ipPs opPs) in uncurry (p ips ops) (allocate partNum ipParts opParts) in p ipSet opSet (replicate arity []) (replicate arity [])
recursive search for the best (1- or) 2-level tree
alternatives [] bestML bestCTree bestIpParts bestOpParts = (bestCTree, bestIpParts, bestOpParts) -- done alternatives ((arity,pFn):pFns) bestML bestCTree bestIpParts bestOpParts = let -- NB. the `1' below is acceptable but not optimal theTree = CTforkTrad 1 pFn leaves -- 2-level tree leaves = map CTleaf leafMdls leafMdls = map estLeafMdl opParts (ipParts, opParts) = partition arity pFn ipSet opSet msgLen = foldl (+) (msg1 theTree) (map (\(leafMdl,opSet) -> foldl (+) 0 (map (msg2 leafMdl) opSet)) (zip leafMdls opParts)) in if msgLen < bestML -- i.e. an improvement then alternatives pFns msgLen theTree ipParts opParts -- new 1 else alternatives pFns bestML bestCTree bestIpParts bestOpParts -- old
in case alternatives (splits ipSet) leafMsg leaf [ipSet] [opSet] of -- subtrees? ((CTforkTrad msgLen pf leaves), ipParts, opParts) -> CTforkTrad msgLen pf (zipWith search ipParts opParts); -- the single leaf wins? (t, _, _) -> t in search ipSet opSet -- --------------------------------------9/2002--L.Allison--CSSE--Monash--.au--
Use a conversion function estFunctionModel2estModel:
estFunctionModel2estModel estFn ipOpPairs = functionModel2model (uncurry estFn (unzip ipOpPairs)) ft = estCTree (estFunctionModel2estModel estFnMdl) splits trainIp trainOp
L. Allison. Types and Classes of Machine Learning and Data Mining, 26th Australasian Computer Science Conference (ACSC2003) pp207-215, Adelaide, Australia, 4-7 February 2003. Conferences in Research and Practice in Information Technology, Vol.16. Michael Oudshoorn, Ed.