[A+DS] <<<prac 3<<< >>>prac 5>>>

CSE2304, Prac' 4, week 8 and 9, 26 April - 7 May 1999

The program [commandLine.c] shows how to handle command line parameters and how to open a named file.
Comp. Sci. & S.W.E. Monash

Write a C program to read in a text string, T, from a file, ignoring all characters not in the alphabet (e.g. {a,c,g,t}). Then read zero or more search strings from standard input. For each search string, S, if S occurs in T then print all of its start locations within T, otherwise print "not found". The objective is to find each S quickly, possibly at the price of some time spent in pre-processing T.

The program should accept the file name as a command-line parameter, e.g.
yourprog HUMHBB

You can allocate a large array for T and assume that it will fit, just printing an error message and terminating if it does not (although it is better if the program is more flexible and robust).

You can use your code from prac#3, e.g. using binary search on the array of sorted substrings to find S, assuming that is what you constructed for #3.

Instrument the program so that it prints the number of character comparisons performed in locating each S. How does this compare to log(|T|)?

Demonstrate the program on the DNA string in [HUMHBB].

[CSE2304 / CSC2040 6 marks]



Advanced: Also implement Rabin's string searching algorithm, instrument it, and compare its complexity with the method above (on HUMHBB). How many string searches would it require to "pay" for the cost of constructing the data-structure(s) in part one?

[CSE2304 / CSC2040 2 bonus marks]



L.Allison, Comp. Sci. and S.W.E., Monash University, Australia.