CSE2304 Prac #2 semester 1, 2000

Practical #2, semester 1, 2000
Group A: week 5, 27 March...
Group B: week 6, 3 April...

The problem is to write a program which reads a sequence of `N' floating point numbers from a file, finds the `M' smallest values in the file, and prints them. M and the filename are given as command line parameters (see ..../C/Basic/). This operation might be useful if studying the tail of a probability distribution, for example.
N is potentially very large, probably too large for all of the N numbers to fit into memory.
M is expected to be much less than N, M<<N, certainly small enough for M numbers to fit in memory.
Write a program in C to carry out the task:

   smallest M thefile

e.g. if thefile contains

7.3
-3.14
10097.1
-99.0
7.3
-3.14
3.14
1000200.779
7.3
7.4

then `smallest 4 thefile' are:

-99.0 -3.14 -3.14 3.14

You must NOT sort all N numbers and then print the first M values! Your program must not use more than O(M)-space. Efficiency is important.

Group A: You must use an array, rather like a stack, to store the M smallest numbers seen so far, in sorted order, as they are collected. You might care to think about [Insertion Sort] although you must not use it, or any other sort, unchanged.
Group B: You must use a priority queue, alias a Heap, as used in [Heap Sort] to store the M smallest numbers seen so far, in the Heap ordering (with the largest of the M-smallest to date at the top). You must not sort the M smallest elements into (asc | desc)ending order, they must be maintained as a `heap'. (It is therefore not necessary to print the M-smallest in ascending order.) You must not sort all N numbers and then print the first M values.
[4 marks]

Both groups must instrument their programs so that the numbers of comparisons of numbers carried out are printed. What can you say about the best-, average- and/or worst-case time-complexity of your program as a function of M and N?
[1 mark]

Every program must be commented with appropriate assertions, pre- & post-conditions, and loop invariants, to show that the program is correct.
[1 mark]

Total: [6 marks]

Practical #2, semester 1, 2000 Group A: week 5, 27 March... Group B: week 6, 3 April...

Practical #2, semester 1, 2000
Group A: week 5, 27 March...
Group B: week 6, 3 April...