^CSE2304^
Pracs 4(A) and 4(B), CSE2304, CSSE, Monash, Semester 1, 2004
The [on-line] versions of the prac's may
include [links], corrections and supplementary material and
are to be taken as the reference documents.
Check the on-line material regularly.
Candidate:
You must prepare your solution to
this programming exercise in advance
(a 6pt subject => 12-hours work per week, every week).
The designated platform, on which your solution must be demonstrated and on
which it will be marked, is
the `gcc' compiler
running on `Linux'.
If you develop a solution on another platform,
it is your responsibility
to ensure that it works correctly on the designated platform.
Read the information under the [prac' guide],
[on C and code modularity], [missed pracs]
and [plagiarism] links on the
[home page].
It is better to have a program that
does only part of the prac' but compiles and runs
rather than to have a more complex program that
crashes or, even worse, does not compile.
So keep copies of old working partial solutions.
Prac's are marked on the performance of your program
and also on your understanding of the program.
I.e. Perfect program with zero understanding => zero marks!
``Forgetting'' is not an acceptable explanation for lack of understanding.
Unless otherwise noted, you must write all the code yourself, and
may not use any external library routines, the usual I/O (e.g. printf)
and mathematical (e.g. log) routines excepted.
NB.
Remember that, from prac1,
each week's prac' groups are set their own specific problems.
Make sure that you do the correct problem for your prac'!
You will get zero marks if you solve the wrong problem.
The exam, and the prac' work, are both hurdles (half-marks)
for the subject.
If you fail one, or the other, or both, the highest mark that
you can get for the subject is 44%(N).
Demonstrators are not obliged to mark programs
that do not compile or that crash.
Time allowing, they will try to help in tracking down errors,
but they are not required to mark programs in such a state,
particularly those that do not compile.
Therefore keep backup copies of working partial-solutions (also see above).
Objectives:
Graphs, reading several files.
You need your solution to prac3.
(Note that, here, |x| is the absolute value of x,
e.g. |3.14| = |-3.14| = 3.14.
As in prac_3, you may use the Standard C Library Function `qsort'
(see man qsort) if it is useful to you; don't if it isn't!)
- Given two C-code files, fi and fj,
you know [how]
to compute the normalised symbol-frequencies,
pi(symbol) and pj(symbol),
for each file.
- Let
dij
= SUMs |pi(s) - pj(s)|,
i = 1..n,
j = 1..n,
where s ranges across all kinds of symbol.
- This defines a symmetric, n×n, adjacency matrix,
dij, for
a collection of C-code files,
f1, f2, ..., fn.
-
- Write a program
- prac4a f1 f2 ... fn
- to compute the adjacency matrix for files f1 f2 ... fn,
and print it in some simple format.
Marking
Calculates the distance between at least two files,
recognises and processes most kinds of symbol correctly,
but maybe not
e.g. "-" v. "--" v. "-=" v. "->" say,
-->3/6 marks.
Calculates the distance between any number of files
and handles most kinds of symbol correctly
-->4/6 marks.
Calculates the distance between any number of files,
handling all kinds of symbol correctly, including escape chars in strings,
tricky comments, etc.
-->5/6 marks.
As above plus well written and thoroughly tested with evidence!
-->6/6 marks.
[some data]
NB. Prac3:
"Words that do not appear in [a file]
at all are ranked [by that file] after those that do appear.
If two words occur equally often
(even zero times),
break the tie (i) by ranking a shorter word first and,
for words of equal length, (ii) alphabetically."
|
-
- Given two text files, ti and tj,
you know [how] to compute the ranks,
ranki(word) and rankj(word),
of the correctly spelled words
(and of "000" (zeroes) for all misspelled words)
in ti and in tj.
- Let
rij
= (1/(vi+1))×{{SUMw in ti |ranki(w)-rankj(w)|}
+ |ranki("000")-rankj("000")|},
i = 1..n,
j = 1..n,
where w ranges over all distinct words in ti, and
vi is the number of distint words (not word occurrences)
in ti.
- This defines a non-symmetric,
n×n, adjacency matrix, rij,
for a collection of text files,
t1, t2, ..., tn.
-
- Write a program
- prac4b t1 t2 ... tn
- to compute the adjacency matrix for files t1 t2 ... tn,
and print it in some simple format.
- NB. Place the dictionary (see 3B) in
a fixed, hard-coded location, e.g. dictionary.
Marking
Calculates the distance between at least two files
but not necessarily doing stemming nor checking spelling
-->3/6 marks.
Calculates the distance between any number of files,
checking spelling, but not necessarily doing stemming
-->4/6 marks.
Calculates the distance between any number of files,
checking spelling and doing stemming correctly
-->5/6 marks.
As above plus well written and thoroughly tested with evidence!
-->6/6 marks.
© 2004 L. Allison,
School of Computer Science and Software Engineering,
Monash University, Australia 3800.
Created with "vi (Linux & IRIX)", charset=iso-8859-1