|
For multivariate data, a classification function
predicts one (or more) output attribute(s) (dependent variable(s))
given the values of the input attributes.
Depending on usage, the prediction
can be "definite" or probabilistic over possible values.
A classification function is learned from, or fitted to,
training data.
It is then tested on (surprise) test data.
Over-fitting is a risk - where the model fits both the structure
and the noise in the training data.
Techniques such as cross-validation can be used to provide a
stopping criterion.
Minimum message length (MML) inference
has a natural stopping criterion and is
generally resistant to over-fitting
The output attribute, its range of values, and the training data
are given - hence `supervised classification'.
Examples of classes of classification (decision-) functions:
|
|