filler
^up^ [01] >>
<Tree I<   >Graph>

(Decision) Classification Trees II

Have covered Trees over binary, e.g. T/F, Head/Tail, 0/1, M/F, ..., here cover

  1. multi-state and

  2. continuous attributes, and

  3. some other generalizations.

This document is online at   http://www.csse.monash.edu.au/~lloyd/Archive/2005-08-Tree02/index.shtml   and contains hyper-links to other resources.


filler
<< [02] >>
decision tree with binary and ternary splits
F F L L F L L L

filler
<< [03] >>
F F L L F L L L

The old stopping rule that   #L=#F+1   only works for binary trees.

But if the arity of each fork is known before its sub-trees are transmitted then subtrees can be matched to parents and the end of the message can be detected.

A full ternary tree with `n' forks has 2n+1 leaves
proof by induction (consider replacing a leaf with a fork + children).

A full tree of constant fan-out, with `n' forks has (fan_out-1)×n+1 leaves.

So P(F) and P(L) are no longer equal in general.   e.g. If all attributes are ternary, odds of F:L = 1:2,   i.e. P(F) = 1/3, P(L) = 2/3.


filler
<< [04] >>
When arities of attributes differ, a practical solution is to:

Assume   P(F) = P(L) = 0.5   for root node.

Use the arity of a node to estimate P(F) & P(L) for its sub-trees. (Necessary to give code-words for F and L.)

P(F) = 1/arity,   P(L) = (arity-1)/arity

Use this rule recursively.
(NB. No problem with n-ary attributes in leaves. Use the [multi-state] distribution.)

filler
<< [05] >>

Continuous (Split) Attributes

e.g. height, weight, temperatures, ...

The Wallace - Patrick solution is to use the values in the attribute (common knowledge) as potential splitting values,

sort values,   and allocate code-words thus:

              quartile    median      quartile      etc.
      -----|-----|-----|-----|-----|-----|-----|-----

P:        1/32  1/8   1/32  1/2   1/32  1/8   1/32
code:    00100  011  00101   1   00110  010  00111

msgLen:    5     3     5     1     5     3     5    bits

An utterly insignificant amount of lost probability can be saved by renormalising so that the finite series sums to 1.0.

This method allows a split-value to be stated imprecisely and cheaply, e.g. just one bit for the median, or more precisely and expensively if justified.

filler
<< [06] >>

Search Problem

MML,   msgLen(T) + msgLen(D|T),   allows two trees to be compared fairly and the better one chosen.

The search problem is to find the best tree of all (hopefully), or at least to find a good tree.

Greedy search with limited lookahead of `L-levels' works well, where L is a small integer L>=0.

NB. Generalizations of the model space make the search problem harder.

filler
<< [07] >>

Generalized Leaves

Split-nodes of a tree partition data by hyper-planes perpendicular to the axes of the data measurements.

Each leaf models data in one element of the partition.

N-state data is modelled by n-state distribution.
Continuous data can be modelled by a continuous distribution.
Multi-variate data can be modelled by a multi-variate distribution.
- in principle.

filler
<< [08] >>

Generalized Splits

n-state discrete

Fragmentation: High-arity discrete splits quickly divide training data into small sets, stopping elaboration of the tree.

In principle, a binary-split is possible on an n-state attribute by specifing a set of values to go left, complement goes right. Message length is n-bits per test.

other

In principle, arbitrary functions can be used to send data to left or right (or more) sub-trees, but their parameters must be stated (cost).

NB. Generalizations make the search problem harder.

filler
<< [09] >>

Search

Typically:
filler
<< [10]

Conclusions

Treated

References


© L. Allison, School of Computer Science and Software Engineering, Monash University, Australia 3800.
Created with "vi (IRIX)",   charset=iso-8859-1