|
The list is a flexible abstract data type.
It is particularly useful when the number
of elements to be stored is not known before running
a program or can change during running.
It is well suited to sequential processing of elements
because the next element of a list
is readily accessible from the current element.
It is less suitable for random access to elements
as this requires a slow linear search.
Two forms of list are considered, flat lists of elements in this chapter
and hierarchical lists in a later chapter.
Types:
type list e = nil | cons(e × (list e))
Constant:
nil
Operations:
null : list e -> boolean
head : list e -> e
tail : list e -> list e
cons : e × (list e) -> list e
Rules, for all x:e and L:list e :
null(nil) = true
null(cons(x,L)) = false
head(cons(x,L)) = x
head(nil) = error
tail(cons(x,L)) = L
tail(nil) = error
Shorthand:
[x1, x2, ..., xn]
= cons(x1,cons(x2, ...cons(xn,nil)...))
The List Abstract Data Type.
The names of the operations are traditional and are derived from
versions of the Lisp programming language.
The definition of list can be read as follows:
a list of e is (=) either `nil' or (|) is `cons' applied to
an element of type e and a list.
Nil represents the empty list.
Null tests for the empty list.
The head of a list is the first element in the list.
The tail of a list is everything but the first element and is a list
not an element.
Cons constructs a list given an element and a list.
- ( The version of cons above is "uncurried".
It is also common to use "curried" versions of functions
especially in functional languages...
- cons : e -> list e -> list e
- and the pattern
- cons x L
- and so on -- saves on parentheses.)
Given the basic operations on lists,
a number of useful functions can be defined.
length(nil) = 0 |
length(cons(e,L)) = 1+length(L)
Note that the length function is recursively defined, as is the list type.
While there are infinitely many lists,
there are only two cases that length must consider.
One case is the empty list and the other is the non-empty list.
These two cases correspond to the two cases
in the definition of the list type - nil and cons(e,L) -
and they are separated by `|' to match the type definition.
Most functions on lists have the same general structure as length.
If one of the cases is missing it probably means the function is incomplete.
append(nil, L) = L |
append(cons(e,L), L') = cons(e, append(L,L'))
Append joins two lists together;
for example, append([1,2], [3,4]) = [1,2,3,4].
If the first list is empty the result is the second list, otherwise
the result is the first element of the first list followed by the result
of appending the tail of the first list to the second list.
Note that if the length of the first list is n, append takes O(n) time.
It makes a copy of this list which finally points to the second;
it does not change the first one.
In this way append does not cause any side-effect.
L ----> 1--->2nil L := [1,2]
L'----------->----------> 3--->4nil L':= [3,4]
^
|
L"----> 1--->2--->|
after L":=append(L,L') = [1, 2, 3, 4]
Note however that if the contents of L' are now changed, by some means,
then so also are the contents of L".
This kind of side-effect is a common cause of program bugs.
The merge function produces one sorted list when given two sorted lists.
merge(nil, nil) = nil |
merge(nil, L) = L |
merge(L, nil) = L |
merge(L, L') = % ~null(L) & ~null(L')
if head L < head L' then
cons(head L , merge(tail L, L'))
else {head L >= head L'}
cons(head L', merge(L, tail L'))
%this version allows duplicates
The original lists are unchanged.
Note the similarities with the array and file merge operations.
The map function applies a function f to every element of
a list and produces a new list.
In this way map f processes a single data structure
as a whole.
map(f, nil) = nil |
map(f, (cons(e,L))) = cons(f(e), map(f, L) )
eg. map(factorial, [1,2,3,4]) = [1,2,6,24]
Map is known as mapcar in some works on Lisp.
The reduce function inserts a binary function, or operator,
between every element in a list.
If the list is empty, reduce returns an identity or zero element z.
reduce(f, z, nil) = z |
reduce(f, z, cons(e,L)) = f(e, reduce(f, z, L))
Reduce(plus, 0, L) adds up the elements in a list of integers.
eg. reduce(plus, 0, [1,2,3,4])
where plus(a,b)=a+b
= 1 + reduce(plus, 0, [2,3,4])
= 1 +(2 + reduce(plus, 0, [3,4]))
= 1 +(2 +(3 + reduce(plus, 0, [4])))
= 1 +(2 +(3 +(4 + reduce(plus, 0, nil))))
= 1 +(2 +(3 +(4 + 0)))
= 10
Reduce(times, 1, L) multiplies the elements in a list of integers
together.
eg. reduce(times, 1, [1,2,3,4]) = 24
where times(a,b)=a*b
= 24
These functions for manipulating lists can be combined to
give many short but powerful programs.
For example, the program
lookup(x, L) = reduce(or, false, map(equals_x, L))
where equals_x(y) = x=y
is a search program and returns true only when x is in the list L.
Map(equals_x, L) returns a list of boolean values,
at least one value is true only when x is in L.
Reduce or's all these values together.
Processing lists and other linked data structures is important
in functional programming languages such as
Haskell, Lisp and ML.
This style of combining simple functions on these data structures
to achieve complex objectives is a powerful programming technique.
As always, the issue of efficiency must be attended to.
For example, there are various ways to reverse a list.
The simplest way but not the best is:
reverse(nil) = nil |
reverse(cons(e,L)) = append( reverse(L), [e] )
To reverse the empty list do nothing.
To reverse a non-empty list,
reverse the tail of the list and append the first element of the original list.
Given a list of length n,
an append operation takes O(n) time on average.
There are n calls to append so this reverse takes O(n2) time.
Note that the element `e' and the list containing it `[e]' have
different types.
A fast list-reversal algorithm takes O(n) time
and uses the accumulating parameter technique.
reverse(L) = reverse1(L, nil)
where
reverse1(nil,L) = L |
reverse1(cons(e,L),L') = reverse1(L,cons(e,L'))
The local function reverse1 does the real work.
Its first parameter is the input list which shrinks at each step.
Its second parameter is the accumulating parameter which grows at each step
as the head of the first is attached to it.
When the first list has shrunk to nothing, the second contains all
the original elements in reverse order and this is the final result.
There are n calls to reverse1 each of which does a constant
amount of work so the algorithm takes O(n) time overall.
Note that although there are infinitely many possible lists
there are really only two types of list -
empty ones and non-empty ones.
Almost every function that manipulates a list must deal with these two cases.
f(nil) = ..... |
f(cons(e,L)) = ....
If one of these cases is missing then there is probably an error.
Processing the empty list is usually easy.
Many functions return a constant - 0, 1, true, false or nil -
in this case.
Processing the non-empty list, cons(e,L), varies from application to
application
but often involves e or f(L) or both.
f(nil) = some constant | -- a very
f(cons(e,L)) = ...e...f(L)... -- common pattern
Look back at the various functions defined to identify this pattern
in each case.
Implementation of Flat Lists
The most natural way to implement lists
is by means of records/structs and pointers.
They can also be implemented by arrays if necessary.
Flat Lists by Records and Pointers
The list type is defined as a pointer to a record (structure).
[C/List/List.h]
.so list/list.type.t
The basic operations manipulate the pointers:
[C/List/Ops.c]
.so list/list.cons.t
Note that cons is implemented by a procedure in Turing and not
by a function because it manipulates a `collection' variable
and functions are not allowed to change variables
in that language.
There is no such problem in Algol68, C, Java or Pascal, say.
The extended operations on lists that were described previously
are easily defined.
[C/List/LengthRec.c]
.so list/list.length.t
If the implementation of lists by pointers can be assumed,
it is more efficient, if marginally less clear,
to use in-line code such as
L=nil(List)
in place of null(L) and similarly for the other basic operations.
Beware: do not use `length(L)>0' when `not null(L)' is adequate;
the former takes O(|L|) time, the latter O(1).
[C/List/Append.c]
.so list/list.append.t
The list reversal functions can be coded as follows:
[C/List/RevS.c]
.so list/list.revS.t
Some languages such as C do not support nested, local subroutines
so the fast, O(|L|) list reversal routine must be coded
by "lifting" the nested routines out, as follows:
[C/List/Rev.c]
.so list/list.revF.t
If the element type of a list can be printed by a library routine,
a list can be printed by repeated calls.
There are three cases to consider.
The empty list should appear as `[ ]',
a list of a single element should appear as `[e]'
and a list of two or more elements should have the
elements separated by commas `[a,b,...]'.
The empty list can be treated as a special case
by an if statement.
The last two cases can be dealt with by using a loop
and placing the printing of the current element
before the exit
and the printing of the comma after the exit.
In this way one fewer commas than elements are printed.
[C/List/Write.c]
.so list/list.put.t
Flat Lists by Arrays
A list can be implemented by an array of records.
Rather than using a pointer to indicate the next element,
an integer is used to hold the index of the next element.
Some convention, such as the use of zero, indicates the null list.
This implementation has the disadvantage that
an array of the maximum size of the list must be declared
even if the list(s) grow and shrink.
This method is useful in a language without dynamic storage.
The Sieve of Eratosthenes and Sets by Lists
The Sieve of Eratosthenes algorithm is based on a set of integers.
Some programming languages that provide a built in `set' data-type
restrict the range of elements of a set, e.g. to 32 or 64,
so that they can be implemented by bit-masks and bit-wise operations.
However, a set can be implemented by a list of members.
In the sieve,
composite numbers are removed from the set
until only primes remain.
To calculate all primes less than or equal to n,
the sieve is initialised to the set 2..n
or the list [2, 3, ...,n].
The routine `range' produces the list [lo, lo+1,..., hi].
[C/List/Range.c]
.so list/list.to.t
A filter routine can be written to remove all multiples
of a number P from a set.
A sieve routine can then be written to remove all composites
from a set.
Sieve requires the first member of the set to be a prime (2 is prime).
It calls filter to remove all multiples of this prime.
At this stage the second number in the set must also be prime
and it calls itself recursively to sieve the remaining (tail)
elements of the set.
[C/List/Sieve.c]
.so list/list.sieve.t
When the process terminates, only prime numbers remain in the set.
Note that lists are fast for sequential access to elements
but slow for random access.
Since Eratosthenes' algorithm takes large strides through the set,
the above implementation is slower than that given in the introductory
chapter.
Recursion and Iteration
Many of the operations on lists and other recursive data types
are naturally expressed as recursive routines.
Recursive routines have fewer state variables than iterative
routines and are frequently easier to prove correct.
However recursion does require system-stack space to operate.
Iterative versions of many routines, particularly for the simpler operations,
are straightforward.
Note that we saw an iterative routine to print a list.
[C/List/LengthItr.c]
.so list/list.lenITER.t
This is not the place to discuss the pros and cons of recursion.
Suffice it to say that recursion and iteration are both
powerful and useful techniques.
Side Effects
The list operations defined so far are free of side-effects
except to the extent that they perform input and output or reserve
dynamic space.
With pointers it is easy to create two or more aliases for one
variable or area of storage.
Assignment via one alias will affect the other implicitly.
This can be very useful indeed on efficiency grounds - when done intentionally.
It is also easy to cause this sort of side-effect unintentionally which is
a frequent source of program bugs.
These bugs can be very hard to track down
because the symptoms often show up far from the original cause.
Despite the dire warnings, the use of side-effects can sometimes
lead to more efficient programs.
In some circumstances it is also relatively easy to be assured that they
can be used safely.
For example, there may only be two list variables in use
and it may be obvious that they cannot share cells.
The cons operation adds an element to the front of a list.
If elements are being added to and deleted from a single list,
the following operations may be useful.
.so list/list.consSE.t
These operations can also be used to insert and delete internal
cells of a list.
before:
P E
| |
| |
v v
L --->1---->2---->3----->4---->5---->6nil
Del(List(L).tl)
Cons(List(P).tl, 9)
Cons(List(E).tl, 7)
after:
P E
| |
| |
v v
L --->1---------->3-->9->4---->5---->6---->7nil
A good example of a side-effect involves list reversal.
The previous versions made reversed copies of a list.
It is also possible to reverse a list in-situ by changing
the pointers in its cells.
The pointer in each of the input list
must be made to point to the previous cell rather than
the next cell.
The pointer in the first cell should become nil.
At least three pointers are needed to step along the list: L, Next and Next2.
L:| initial state
|
|
v
1--->2--->3--->4--->5nil
L:| |:Next an intermediate state
| |
| |
v v
1nil<---2 3--->4--->5nil
^
|
|Next2
|:L
|
|
v
1nil<---2<---3<---4<---5 final state
An Example of Reversal.
The cell pointed to by Next is set to point back to L.
This would loose the cell after Next so the
third pointer, Next2, remembers it
in the algorithm:
[C/List/RevSE.c]
.so list/list.revSE.t
This routine takes O(n) time
and uses O(1) storage.
Of course, if the list should be partially shared with another
then this will probably cause a major program error that is difficult to
find and correct.
Note particularly the use of the three list variables, L, Next and Next2
and the order of assignment.
At an intermediate stage the list is partially reversed; L
points to the start of the reversed section.
Next points to the start of the unreversed section.
The tail pointer of the next cell is made to point to L.
Next2 is necessary to avoid loosing our place in the unreversed list.
Finally L is moved to Next and then Next to Next2.
The process terminates when there are no more cells to reverse
and L points to the entire reversed list.
Garbage
Repeated
allocation of dynamic space by new
may cause all available storage to be used up.
It is often the case that as new dynamic structures are created
old structures are discarded or become garbage.
If they are not recycled the program may come to a halt prematurely.
Some languages provide a system garbage collector to
do this recycling as necessary but it is an expensive operation.
Instead, the free statement is used to return dynamic
storage for reuse.
This is of course a special kind of side-effect and unless a
language processor carries out checks for aliases, at some cost,
its use may cause program bugs unless care is taken.
In particular part of a structure must not be freed if it
is shared with a second structure - consider append.
Free only returns one unit of dynamic storage.
To free a complete structure it must be traversed and each cell freed.
[C/List/Dispose.c]
.so list/list.free.t
Testing and Debugging
If all goes well this section will be unnecessary
but the manipulation of pointers is a fruitful area for program bugs.
The use of pre- and post-conditions and other assertions which can be tested
when the program is being developed, and commented out or turned-off
in production, is a good idea.
It is useful to write a list output routine as one
of your first list routines.
This enables tracing statements to be inserted to print out the values
of list variables if and when a program fails.
Routines on lists can be easily tested or debugged in isolation
especially if they are written in a functional style, free
of side-effects.
Recall that almost all routines must contain this pattern:
f(nil) = some constant |
f(cons(e,L)) = ...e...f(L)...
It is usually sufficient to test list parameter values
for the cases of an empty list,
a list of several elements and sometimes also a list of one element.
If there are two list parameters this gives up to nine cases.
If there are other parameters there may be other cases
and combinations.
Keeping routines small, simple and free of side-effects and global
variables reduces the number of cases and interactions
to be considered.
If routines are free of side-effects and are individually correct
they can usually be combined with great confidence.
It is the presence of side-effects, especially the
manipulation of global variables, that causes surprising
interactions and bugs.
The most common errors when using pointers are:
- unterminated lists (should terminate with nil),
- lost or tangled pointers often because pointers assigned in wrong order,
- isuse of shared cells, overwriting, premature disposal.
It is necessary to assign some value, such as nil, to a pointer.
An uninitialised pointer variable contains an undefined value.
If it is used and
you are lucky the program will stop with an immediate error
otherwise it may behave incorrectly but take a long time
for this to become evident.
A list structure may get linked into a circle.
This can be useful if it is deliberate - see the chapter on queues.
However, many functions will loop forever when given a circular list.
If two lists share cells, changing
one list may change the other surprisingly.
Exercises
- Write a routine to sort a list without side-effects.
- Write a routine to sort a list in-situ as a side-effect.
- For each sorting method in the chapter on sorts,
determine if it can be applied to lists.
- Write an iterative routine to join one list to the end of
another as a side-effect.
- Implementing sets as lists,
give set intersection, union, difference and membership routines,
without side-effects.
- Implement very large integers as lists of digits in some base.
Give routines for integer equality and the basic operations such as addition,
subtraction, multiplication and (harder) division.
© L_Allison,
Department of Computer Science, U.W.A. 1984-86.
|
|