^CSE2304^

Practicals 4/5(A) and 4/5(B), CSE2304, CSSE, Monash, Semester 1, 2005

The practical problems are available online via the unit home page for this year. The [on-line] practical problems may include [links], corrections and supplementary material and are to be taken as the reference documents. Check the on-line material regularly.

Student: You must prepare your solution to this programming exercise in advance (a 6pt subject => 12-hours work per week, every week). The designated platform, on which your solution must be demonstrated and on which it will be marked, is the `gcc' compiler running on `Linux'. If you develop a solution on another platform, it is your responsibility to ensure that it works correctly on the designated platform. Read the information on the practical-work guide, C and code modularity, missed practicals and plagiarism on the [home page]. It is better to have a program that solves only part of the problem but compiles and runs rather than to have a more complex program that crashes or, even worse, does not compile. So keep copies of old working partial solutions.
    Practical work is marked on the performance of your program and also on your understanding of the program. I.e. Perfect program with zero understanding => zero marks! "Forgetting" is not an acceptable explanation for lack of understanding. Unless otherwise noted, you must write all the code yourself, and may not use any external library routines, the usual I/O (e.g. printf) and mathematical (e.g. log) routines excepted.
NB. Remember that each week's practical group is set its own specific problem. Make sure that you do the correct problem for your week! You will get zero marks if you solve the wrong problem. The exam, and the practical work, are both hurdles (half-marks) for the subject. If you fail one, or the other, or both, the highest mark that you can get for the subject is 44%(N).

Demonstrators are not obliged to mark programs that do not compile or that crash. Time allowing, they will try to help in tracking down errors, but they are not required to mark programs in such a state, particularly those that do not compile. Therefore keep backup copies of working partial-solutions (also see above).

Objectives: The main purpose of this practical is to apply your knowledge of graphs and to implement a graph algorithm.


Practical 4/5(A), week 10, 9-13 May and 12, 23-27 May

NB. Number
of differences
squared.
--2/5/05
& some data[*]:
[1] & [2]. Also
make your
own! --4/5/05
There are four DNA letters or "bases" {A, C, G, T}. Given two DNA sequences, s1 and s2, of length K, an integer-valued distance (not the edit-distance) can be defined on the sequences:
d(s1,s2) = (Sumk=0..K-1 (if s1[k]=s2[k] then 0 else 1))2
e.g. d("ACATTA","CCATGA") = 4
Given N DNA sequences, s0, ..., sN-1, d(,) defines a symmetric, integer-valued N×N adjacency matrix of a weighted graph over the sequences in the obvious way.
Write a C program to process this kind of data as follows:
Read a given file (specified by a command-line parameter) containing N, K, and N DNA sequences of length K, e.g.
5 6
ACATTA
CCATTA
ACATGG
CGAATA
ACATGG
By the end of practical-4, you must have a program to read the data, and compute the adjacency matrix and print it in some simple format.
By the end of practical-5, you must have a program to also compute the shortest-paths from the "source" (the 1st sequence) to every other sequence, and for each such path print the cost of the path and the path itself (its edges).
Note that there is no particular limit on the maximum values of N or K.
Your program could be used to explore evolutionary relationships between fixed-length DNA sequences.

[*] Apparently from the malaria 'rifin' gene family -- via Tobias S.

Marking (end of practical-5)

Does not compile or crashes --> 0-5/12 marks.
Reads data, computes d(,), prints adjacency matrix --> 6/12 marks.
Prints all shortest paths as required, correct on all of demonstrator's tests --> 8/12 marks.
As above plus elegant code with appropriate assertions, invariants, pre- and post-conditions --> 10/12 marks.
As above plus well tested (with evidence) --> 12/12 marks.

Practical 4/5(B), week 11, 16-20 May and 13, 30 May - 3 June

[E.g.] just
one data
test set.
Also make
your own!
--2/5/05
Given K "attribute" values or measurements for two "things", t1 and t2, a real-valued distance can be defined on the things:
d(t1,t2) = sqrt( Sumk=0..K-1 (t1[k] - t2[k])2 )
e.g. d([3.1, 1.7, 4.8, 2.2], [3.0, 1.6, 4.9, 2.2]) = sqrt(0.03)
Given such "multivariate" data on N things, d(,) defines a symmetric, real-valued, N×N adjacency matrix of a weighted graph over the things in the obvious way.
Write a C program to process this kind of data as follows:
Read a given file (specified by a command-line parameter) containing N, K, and K attribute values for each of N things, e.g.
6 4
3.1 1.7 4.8 2.2
3.0 1.6 4.9 2.2
2.9 1.8 4.6 2.1
2.9 1.5 5.0 2.3
2.9 1.7 4.9 2.4
3.0 1.9 4.4 1.9
By the end of practical-4, you must have a program to read the data, and compute the adjacency matrix and print it in some simple format.
By the end of practical-5, you must have a program to also compute a minimum spanning tree (MST) [1] [2] of the graph and print the edges of the MST and the cost of the MST in some simple format.
Note that there is no particular limit on the maximum values of the attributes, or of N or K.
Your program could be used to explore taxonomic relationships present in multi-variate statistical data. E.g. The things could be animal or plant specimens.

Marking (end of practical-5)

Does not compile or crashes --> 0-5/12 marks.
Reads data, computes d(,), prints adjacency matrix --> 6/12 marks.
Prints MST as required, correct on all of demonstrator's tests --> 8/12 marks.
As above plus elegant code with appropriate assertions, invariants, pre- and post-conditions --> 10/12 marks.
As above plus well tested (with evidence) --> 12/12 marks.


© 2005 L. Allison, School of Computer Science and Software Engineering, Monash University, Australia 3800.
Created with "vi (Linux & IRIX)",   charset=iso-8859-1