^CSE2304/2006
^
Practicals 2-3, CSE2304, Monash, Semester 1, 2006
The
practical problems are available online via the unit home page for this year.
The [on-line] practical problems
may include [links], corrections and supplementary material and
are to be taken as the reference documents.
Check the on-line material regularly.
Class:
You must prepare your solution to
this programming exercise in advance
(a 6pt subject => 12-hours work per week, every week).
The designated platform, on which your solution must be demonstrated and on
which it will be marked, is
the `gcc' compiler
running on `Linux'.
If you develop a solution on another platform,
it is your responsibility
to ensure that it works correctly on the designated platform.
Read the information on the practical-work guide,
C and code modularity, missed practicals
and plagiarism on the unit
[home page].
It is generally better to have a program that
solves only part of the problem but compiles and runs
rather than to have a more complex program that
crashes or, even worse, does not compile.
So keep copies of old working partial solutions.
Practical work
is marked on the performance of your program
and also on your understanding of the program.
I.e. Perfect program with zero understanding => zero marks!
"Forgetting" is not an acceptable explanation for
lack of understanding.
Unless otherwise noted, you must write all the code yourself, and
may not use any external library routines, the usual I/O (e.g. printf)
and mathematical (e.g. log) routines excepted.
The exam, and the practical work, are both hurdles (half-marks)
for the subject.
If you fail one, or the other, or both, the highest mark that
you can get for the subject is 44%(N).
Demonstrators are not obliged to mark programs
that do not compile or that crash.
Time allowing, they will try to help in tracking down errors,
but they are not required to mark programs in such a state,
particularly those that do not compile.
Therefore keep backup copies of working partial-solutions (also see above).
Objectives:
Problem solving, and
adapting algorithms or data-structures or both.
This practical is based on a real-world 'mapping' problem
from computational molecular biology;
the underlying abstract problem has been identified for you and
the many distracting complications of its origins have been removed.
Efficiency is important.
It is possible to build a good, practical solution
to the problem using the techniques of CSE2304 and its prerequisites
but they have to be selected wisely and adapted to the problem.
Some inventiveness may be necessary.
Practical 2, week 6 begins 3 April, and also
practical 3,
week 9
begins 1 May
- Two fragments [a,b] and [c,d], where a, b, c and d
are floating-point numbers between 0.0 and 1.0 inclusive,
0<=a<b<=1 and 0<=c<d<=1, can overlap each other in
the following ways:
1:
[a___________b]
[c_______________d]
that is a<=c<=b<=d
|
2:
[a______b]
[c____d]
that is c<=a<=d<=b
|
3:
[a________________b]
[c___d]
that is a<=c<d<=b
|
4:
[a_____b]
[c______________d]
that is c<=a<b<=d
|
They overlap if c<=b and d>=a.
They do not overlap if c>b or d<a.
- Note, [p, q] and [q, r] are considered to overlap.
- Problem:
Write a program, 'fragments inputfile', to solve the following problem.
(There are examples of processing command-line parameters here:
[..../tildeProgLang/C/Basic/].)
- You are given an input file which contains
a sequence of n pairs of numbers representing
the start and end points of fragments
[i1,j1], [i2,j2],
..., [in,jn].
- For each fragment, [im,jm],
you must find the numbers
of all the earlier fragments
[ik,jk] k<m, if any, that overlap
[im,jm].
Print "m:" at the start of a line
followed by the numbers of the earlier overlapping fragments, if any,
in ascending order without duplicates.
If a line would contain more than five overlaps, break it by
printing "\n+:" before the 6th, 11th, 16th, ..., etc., numbers.
(You may use the `qsort' library routine to sort the fragments
that overlap [im,jm], if you want to.)
- Process each fragment [im,jm] as soon as it is read.
- e.g.
inputfile | notes |
0.23 0.31
0.61 0.90
0.25 0.40 0.35
0.65
0.01 0.0109 0.60 0.62
0.0 1.0
|
fragment 1, [0.23, 0.31]
fragment 2, [0.61, 0.90]
fragment 3, [0.25, 0.40], & left of the 4th [0.35,
0.65]
etc.
|
output | notes |
1:
2:
3: 1
4: 2 3
5:
6: 2 4
7: 1 2 3 4 5
+: 6
|
none, of course, for the first
|
- Efficiency is important.
Note that n can be very large indeed, possibly in the millions, or bigger.
- There are two aspects to the size of an instance of this problem:
- The size of the input, n, the number of given fragments.
- The size of the output, v, the total number of overlaps found.
Obviously a program cannot possibly have time-complexity better than O(n+v).
That lower bound may be unachievable,
particularly for some kinds of "difficult" data -- see below.
Ideally we do not want an n2 term in the complexity.
It is expected that v<<n2 if n is big.
- Data.
[Sample data] will be made available;
check the on-line version of the practicals.
It is likely that statistical properties of the data
will affect the run-time of a solution.
The "easy" kind of data is likely to be "random"
where there is little correlation between fragments,
where fragments of a given size and position are
equally likely to occur anywhere in the input file, and
so on.
Just what features make data "difficult" depends on the solution but
correlations between the sizes, positions and/or orderings of fragments
are likely to cause difficulties.
Marking
Practical 2
Does not compile or often crashes --> 0/3 marks.
A running program which
reads the data and has a correct predicate 'overlap'
--> 2/3 marks,
and also solves all small problems correctly
--> 3/3 marks.
NB. These marks are for prac2;
you cannot "pick them up" later at prac3.
No extra marks at this stage for solving large problems.
Practical 3
Does not compile or often crashes --> 0/9 marks.
As for prac2 "...solves all small problems",
plus well written including any appropriate invariants,
pre- and post-condtions and other assertions, and
with solid evidence of good testing --> 4/9 marks.
Plus better than O(n2) run-time for
large data-sets of the "easy kind" (see above), and
solid evidence of good testing --> 5/9 to 6/9 marks.
Plus rigorous argument for correctness and termination,
--> 7/9 marks.
Plus thorough exploration of time-complexity
for "easy" and "difficult" data (see above),
small to large n and small to large v
--> 8/9 to 9/9 marks.
NB. The demonstrators will not mark prac3 until week 9.
© 2006
L. Allison,
Faculty of Information Technology (Clayton),
was School of Computer Science and Software Engineering,
Monash University, Australia 3800.
Created with "vi (Linux & IRIX)", charset=iso-8859-1
For the purposes of sorting the fragments that
overlap the current fragment (see #5 above), it is acceptable to allocate
an array of a size that has been "#defined",
provided that the size can be easily changed.
(A fixed size is inelegant but this issue is not the point of the prac.)
- If the average fragment length is made smaller
(by making the 3rd parameter of the
[generator] large)
then the number of overlaps, v, will be made smaller, on average.
Make the average length small enough and v will tend to zero,
for a given value of n.
5/4/2006
Here is an exercise that might or might not help:
Print out 10 random fragments, e.g. by running the data
[generator].
Cut up the paper so that you have 10 pieces,
each piece with one fragment on it.
Deal the 10 pieces of paper into 2 piles of 5 each.
Is there any summary information that you could calculate and store
for each pile that could tell you if
pile1, say, might contain a fragment that overlaps
the next 11th fragment or if it
cannot possibly contain such a fragment?
- Q4, tute3 might be of some help for prac3,
but it does not solve the prac.
11/4/2006
Previous fragments that strictly contain the current fragment,
[im,jm], cause some solutions the most problems:
There might be a million previous fragments [a,?] such that
a<=im
and a million previous fragments [?,b] such that b>=jm, and
yet none of them might satisfy both
a<=im & b>=jm.
26/4/2006