^CSE2304^

Pracs 4(A) and 4(B), CSE2304, CSSE, Monash, Semester 1, 2004

The [on-line] versions of the prac's may include [links], corrections and supplementary material and are to be taken as the reference documents. Check the on-line material regularly.

Candidate: You must prepare your solution to this programming exercise in advance (a 6pt subject => 12-hours work per week, every week). The designated platform, on which your solution must be demonstrated and on which it will be marked, is the `gcc' compiler running on `Linux'. If you develop a solution on another platform, it is your responsibility to ensure that it works correctly on the designated platform. Read the information under the [prac' guide], [on C and code modularity], [missed pracs] and [plagiarism] links on the [home page]. It is better to have a program that does only part of the prac' but compiles and runs rather than to have a more complex program that crashes or, even worse, does not compile. So keep copies of old working partial solutions.

Prac's are marked on the performance of your program and also on your understanding of the program. I.e. Perfect program with zero understanding => zero marks! ``Forgetting'' is not an acceptable explanation for lack of understanding. Unless otherwise noted, you must write all the code yourself, and may not use any external library routines, the usual I/O (e.g. printf) and mathematical (e.g. log) routines excepted.

NB. Remember that, from prac1, each week's prac' groups are set their own specific problems. Make sure that you do the correct problem for your prac'! You will get zero marks if you solve the wrong problem. The exam, and the prac' work, are both hurdles (half-marks) for the subject. If you fail one, or the other, or both, the highest mark that you can get for the subject is 44%(N).

Demonstrators are not obliged to mark programs that do not compile or that crash. Time allowing, they will try to help in tracking down errors, but they are not required to mark programs in such a state, particularly those that do not compile. Therefore keep backup copies of working partial-solutions (also see above).

Objectives: Graphs, reading several files. You need your solution to prac3.
(Note that, here, |x| is the absolute value of x, e.g. |3.14| = |-3.14| = 3.14. As in prac_3, you may use the Standard C Library Function `qsort' (see man qsort) if it is useful to you; don't if it isn't!)


Prac 4(A), week 10, 10-14 May 2004

[some data]
Given two C-code files, fi and fj, you know [how] to compute the normalised symbol-frequencies, pi(symbol) and pj(symbol), for each file.
Let dij = SUMs |pi(s) - pj(s)|,   i = 1..n,  j = 1..n, where s ranges across all kinds of symbol.
This defines a symmetric, n×n, adjacency matrix, dij, for a collection of C-code files, f1, f2, ..., fn.
 
Write a program
prac4a f1 f2 ... fn
to compute the adjacency matrix for files f1 f2 ... fn, and print it in some simple format.

Marking

Calculates the distance between at least two files, recognises and processes most kinds of symbol correctly, but maybe not e.g. "-" v. "--" v. "-=" v. "->" say, -->3/6 marks.
Calculates the distance between any number of files and handles most kinds of symbol correctly -->4/6 marks.
Calculates the distance between any number of files, handling all kinds of symbol correctly, including escape chars in strings, tricky comments, etc. -->5/6 marks.
As above plus well written and thoroughly tested with evidence! -->6/6 marks.

[some data]
NB. Prac3: "Words that do not appear in [a file] at all are ranked [by that file] after those that do appear. If two words occur equally often (even zero times), break the tie (i) by ranking a shorter word first and, for words of equal length, (ii) alphabetically."

Prac 4(B), week 11, 17-21 May 2004

Given two text files, ti and tj, you know [how] to compute the ranks, ranki(word) and rankj(word), of the correctly spelled words (and of "000" (zeroes) for all misspelled words) in ti and in tj.
Let rij = (1/(vi+1))×{{SUMw in ti |ranki(w)-rankj(w)|} + |ranki("000")-rankj("000")|},   i = 1..n,  j = 1..n,   where w ranges over all distinct words in ti, and vi is the number of distint words (not word occurrences) in ti.
This defines a non-symmetric, n×n, adjacency matrix, rij, for a collection of text files, t1, t2, ..., tn.
 
Write a program
prac4b t1 t2 ... tn
to compute the adjacency matrix for files t1 t2 ... tn, and print it in some simple format.
NB. Place the dictionary (see 3B) in a fixed, hard-coded location, e.g. dictionary.

Marking

Calculates the distance between at least two files but not necessarily doing stemming nor checking spelling -->3/6 marks.
Calculates the distance between any number of files, checking spelling, but not necessarily doing stemming -->4/6 marks.
Calculates the distance between any number of files, checking spelling and doing stemming correctly -->5/6 marks.
As above plus well written and thoroughly tested with evidence! -->6/6 marks.


© 2004 L. Allison, School of Computer Science and Software Engineering, Monash University, Australia 3800.
Created with "vi (Linux & IRIX)",   charset=iso-8859-1