Case Study: TimeSeries by Stateful Functions11 March 2004 |
|
All of the example TimeSeries models in the [200309] version of the Haskell code are based on functions (->) from the context of previous values to a Model for the next value; this suits low-order Markov models, for example, quite well. As was noted, it is obvious that in some situations another natural way to define a TimeSeries model is by a function manipulating a state, and this is developed below. It is easy to create a suitable data type to allow TimeSeries to be defined in the new way, and to give the data type the appropriate instance definitions (note that a good choice of names is important and yet somewhat arbitrary, both, and that names might well change from those in this experimental code in some future version):
(Although SST's parameters have the same types as those of TSM, [200309], note that a TSM is given a past context in reverse order but an SST is given a whole data-series in forward order. See below on possibilities for abstracting the state.) There is a good candidate (parameterless) TimeSeries model for series of Enumerated Bounded data, adaptive, which can be created in the new way: adaptive assumes that the data series is homogeneous but makes no other claim about it. The name comes from the use of adaptive codes in file compression. It operates by keeping (a state) counts of the number of times that each value has been seen prior to the current element and bases the Model for that element on the values of the counts. So that all probabilities are greater than zero, every count is initialised to one. The appropriate count is incremented when the element is passed over.
At each position, the counts are normalized (it is redundant but convenient to have the total to help), and turned into a Model (of Int) by probs2model, and thence into a Model of the data element type by modelInt2model, i.e. the TimeSeries's prediction for the next element. (Note that all TimeSeries models make one more prediction than there are elements in the given data series.)
Notesadaptive is parameterless and has zero part-one message length (complexity), although this is not true of all TimeSeries. adaptive assumes that a given data series is homogeneous,
but not that ds1++ds2 is, say.
It should be the case that, e.g.
Tuples of Enumerated Bounded types have previously been made Enumerated Bounded so adaptive can automatically apply to them. The state for adaptive consists of (total,counts) and it slowly changes as the data series is scanned. As the code stands, there is nothing to prevent a careless or unscrupulous implementor of adaptive, or a similar TimeSeries, from cheating by looking at the current element before making a prediction. It is natural to ask if the state, the state-transformation and the state-to-prediction mapping can be abstracted, and if the scanning process can be moved into the TimeSeries-instance definition of the new type [11/3/2004]. The original TimeSeriesType and TSM, [200209], perhaps form a special case of the new stateful TimeSeries with state=context, initial state=[], and state transition function ((:)d) [23/3/2004]. Abstracting the StateA first attempt at abstracting the state might be:
However, this means that a mixture of two or more StateSeriesType2 models can only be formed if they all have the same type, including the state. A better solution is to make the state an existential type (i.e. forall(!), requiring type extensions) SST3b:
Other code should not care what type the state is, provided that the type is consistent with the initial state, the state transformation function and the function mapping states to predictions.
If we only had the one constructor SST3b,
mixtures could still not be formed from stateful TimeSeries
because the result must be of the same type so it
would be necessary to build some sort of state.
An obvious candidate would be a product of the components' states,
but the details have just been hidden![*]
And even if it could be built its type differs from those of the components.
Introducing the constructor SST3a
allows mixtures without considering states.
There remains a more general difficulty: It should be possible to form a mixture of any collection of suitable TimeSeries, stateful or non-stateful, including those already defined in [200309]; only the dataSpaces should have to match. This is possible if SST3a and SST3b (or equivalents with better names) are added beside TSM in the existing TimeSeriesType rather than being declared in their own new type. Here is an updated version of adaptive:
The remaining drawbacks are modest:
On mixturesSST3a proved unnecessary, at least for mixtures.
The state can be "changed" in the time-series model,
A useful little function turns a context-based time-series model into a state-based one where state=context::[dataSpace]:
|
|
|