^CSE2304^ [plan] [other courses]

## CSE2304 Algorithms and Data Structures, Progress 2005

NB. Page numbers do not necessarily correspond exactly to lecture numbers:
                     ; the notes are for sale in the bookshop.
 Tutes:      Pracs:   [2&3] [4&5] Net: Bib.  , [Alg's], [C].

Lectures: Tues 11.00am (8/R1), Wed 2.00pm (11/H1).

### Week 1 starts Monday 28 February

• [Intro.], admin and the start. Practical 0 (shared with CSE2303) week 2, tutorial 1 weeks 2 and 3, practical 1 weeks 3 and 4.
This unit is about problems, algorithms and abstract data structures, that is abstract mathematical things, not about C programming as such. Boolean, Int and Tree are examples of abstract data types. A parser is a useful source of (parse-) trees; see practical 1.
(Grammars are mostly used to parse sentences in a language. Sometimes a grammar is used to generate sentences, e.g. to generate test data for another program. For interest only, Andrew Bulhak's postmodernism generator is a spectacular example of using a grammar to generate pseudo-random essays.)
• [List and tree] abstract data types (ADTs) and examples of useful algorithms on them. Can prove properties of ADTs and operations on them -- beats testing (a test can only detect an error not the absence of any error). Saw slow (O(n2)) and fast (O(n)) tree-flattening algorithms. (Note that append L1 L2 takes O(|L1|)-time but cons takes O(1)-time, i.e. is a constant-time operation.) And...

### Week 2 starts Monday 7 March [prac0(all)] [tute1]

• ... example of operations on parse trees (over-run from last Wed.).
[Proof of correctness] and termination of algorithms: A test can only show the presence of a bug; a proof is much stronger and shows that an algorithm is always correct. ADTs such as list, tree, stack,... are defined in an algebraic way and we can do algebra on many algorithms on such ADTs, as in L1 & [L2]. Some algorithms however rely on manipulating a 'state' from satisfying a pre-condition, through intermediate steps, to a post-condition. We use logic to prove such an algorithm (i) is correct if it terminates, and (ii) it terminates. Binary search is just an example; the point of [lecture 3] is the method of reasoning with pre- & post-conditions, loop invariants and other assertions.
• [Simple sorting] algorithms used as more examples for algorithm analysis and proof in the style of [L3]. We also saw that starting with a loop invariant for a rough idea leads us to design selection sort, O(n2)-time, always. Reconsidering the invariant, a weaker condition is easier to maintain which leads to insertion sort which is twice as fast on average, and takes O(n)-time in the best-case. Also discussed 'stability' in sorting. Use the demonstrations [SS] & [IS].

### Week 3 starts Monday 14 March [prac1(A)] [tute1]

•  [help-room] timetable s1 2005 -- consultation.
[Priority queue] and heap-sort. The sorts in [L4] divide the array into two parts (sorted & not-yet sorted), choose an element from one part and add it to the other. Selection sort (SS) spends time to choose the "best" (smallest not-yet sorted) element -- so adding it is easy. Insertion sort (IS) chooses "any old" (the next) element so it must spend time on adding it correctly. (IS is faster on average and in the best case.) Heap sort chooses the "best" element, (here largest) but does it quickly using the [heap] property. One heap operation takes O(log n)-time, and there are 3n/2 of them, so [heap sort] runs in O(n log n)-time. (Also see [PQ] & [HS].)
•  If (only if) you want a programming challenge -- try to invent on O(n)-time, O(1)-space, in-situ merge algorithm -- v. hard!
[More fast sorts], merge-sort and quick-sort are examples of the divide & conquer strategy. Q: What is the best sort? A: It depends. QS is fastest on average but O(n2) in the worst case. HS and MS always take O(n log n)-time but HS has poor paging behaviour and MS uses O(n) extra work space. If sorting small arrays, insertion sort may even be OK. (Also see [MS] & [QS].)

### Week 4 starts Monday 21 March [prac1(B)]

•  [help-room] timetable s1 2005 -- consultation. (For interest only, a recent research paper on an alignment algorithm.)
In recent lectures we used mostly familiar algorithms as examples for analysis -- correctness and time- & space-complexity. Here is a non-familiar problem and algorithm. The [edit distance] algorithm is an example of the dynamic programming strategy. Note that the algorithm establishes an invariant property on a data structure, the D[,] matrix, that D[i,j]=dist(A[1..i],B[1..j]); as a result D[|A|,|B|]=dist(A,B) when the algorithm has finished.
• [Hirschberg's] algorithm computes the value of the edit-distance and also an optimal alignment in O(n)-space, O(n2)-time. Note the trade-off: space is reduced n2--->n at the price of #instructions executed doubling.

### Easter, Friday 25 March - Friday 1 April 2005

Do prepare prac2 and also tute2!

### Week 5 starts Monday 4 April [prac2(A)] [tute2]

• [Tables] operations on a lookup table; implementation by unsorted and sorted arrays, hash-tables (revision), and binary search tree (including deletion). Note that the number of nodes in a well-balanced binary tree of height h grows as 2h -- see Q4.1 of tutorial 1 -- but this is not true of unbalanced degenerate trees.
• [AVL] trees, |height(left)-height(right)|<=1, and left and right are AVL-trees. The height of an AVL-tree containing n elements is O(log(n)), even in the worst case. Search, insertion, deletion take O(log(n))-time, even in the worst-case.  [Demo].

### Week 6 starts Monday 11 April [prac2(B)] [tute2]

•  There are solutions to tutorial-1 on the 2nd year notice board -- for as long as space is sufficient.
[Trie & PATRICIA] are implementations of lookup tables for holding strings. A simple trie is an |alphabet|-way branching tree. There can be many "wasted" null pointers at lower levels on a simple trie. Using a linked list for a node may reduce the number of "wasted" pointers. A PATRICIA stores the differences between strings. On a path from the root, some string-positions may not be tested but those that are must be tested in ascending order. Short strings can be treated as if padded-out with a null-character, '-', say; note that the null-character cannot occur before real-characters, e.g. ab- cannot be a prefix of ab-b..., say. Time-complexity of search, insert, & delete is O(|word|) for tries and PATRICIAs -- independent of the number of items in the lookup table.
• [2-3- to B-trees], more height-balanced trees for implementing lookup tables. The B-tree is important in data-bases.
 Prac 2/3 test data: Do also test your program on data other than from gen2.c, including but not limited to  e.g.  1, 2, 3, ..., 1000000  and  1000000, 999999, ..., 3, 2, 1. There is at least one solution "floating around" which is fast on random data (which is good) but slow (O(M×N)) on ascending or descending data (i.e. not perfect). It is not possible to set a hard and fast time-limit such as 11.7 seconds, say, because speed will vary with processor, cache, memory, disc, load, etc.. What is more important is the trend when M and/or N are doubled etc..

### Week 7 starts Monday 18 April [prac3(A)NB out of 12] [tute3]

• [String searching] -- naive-algorithm worst-case O(|txt|×|pat|)-time; Rabin's algorithm almost always O(|txt|)-time (but O(|txt|×|pat|) in worst case); Boyer-Moore algorithm on average O(|txt|/|pat|)-time (if |pat|<<|alphabet|) and O(|txt|) in the worst case. [demo]  It is easy to see that given a single pat[] to search for in a txt[], in general every character of txt[] must be examined, so O(|txt|) is a lower bound on the worst-case time-complexity of an algorithm for the problem; B-M actually achieves this and in practice it often becomes even faster the longer pat[] is! (We will see later that one can do better if there are many searches on the one txt[].) (NB. Re the B-M algorithm, 'sets' of a bounded enumerated type, such as chars, can be implemented by bit-maps and the set operations (intersection, union & membership) then take O(1)-time.)
• [Graphs], directed and undirected, weighted and unweighted, sparse and dense, examples from road maps (or web pages, or electrical circuits, or pipe networks, or air routes, or...)  [continued...

### Week 8 starts Tuesday 26 April

• ...continued]. (Note that adjacency matrices are efficient for representing dense graphs and adjacency lists are efficient for sparse graphs.)
[Path problems]: Dijkstra's 'single-source shortest paths' (SSSP) algorithm for weighted directed graphs and Warshall's 'transitive closure' (TC, connectedness) algorithm for unweighted (or weighted) directed graphs; continued...
 Dijkstra's: Source vertex vi. 'done' is a set of vertices. Initially done={vi}. Invariant (Inv): Algorithm has a tree of shortest paths from source vi to other vertices in done, and also knows all shortest paths of the form {vi, ..., vk, vu} where vi to vk are in done and vu is in V-done. Repeatedly, add to done the vertex, vc, that is in V-done and is closest to the source. Keep Inv true. [Not examinable: This algorithm often "reappears" to e.g. format text, find an optimal (compression) code for binomial data, segment time-series, and fit polygons.]
• ...Floyd's all-pairs shortest paths (APSP) algorithm. Note the similarities and differences between Warshall's TC algorithm and Floyd's APSP algorithm.
[MST], minumum spanning trees, Prim's algorithm. Note the similarities and differences between Dijkstra's SSSP algorithm and Prim's MST algorithm.

### Week 9 starts Monday 2 May [prac3(B)NB out of 12] [tute3]

• [MST], Kruskal's algorithm, worst-case O(|E|.log|E|)-time, i.e. beats Prim's algorithm on sparse (|E|<<|V|2) graphs but not on dense graphs. Kruskal's algorithm also provides another application of a priority queue / heap (lecture 5).
• [DAGs] - Directed Acyclic Graphs are an important special kind of graph. DAGs are rather like trees, indeed we saw a DAG used to represent common sub-expressions in lecture 2, and depth-first traversal of a DAG is useful, e.g. for the topological-sort (T-S) problem and the critical-path (C-P) problem; the latter problem is on vertex-weighted DAGs. (Note that a T-S algorithm can work either from sources to sinks, or from sinks back towards sources. Ditto for a C-P algorithm.)

### Week 10 starts Monday 9 May [prac4(A)] correction, [tute4]

 Revision: There are past exam papers, notes, & tutorial questions under "past years" on the unit's [home page]. Also see "prequisite knowledge" and the FAQ.
• [Suffix tree]. Logically this topic goes with tries, PATRICIAs (lecture 11) and string search (lecture 13) but it was held over so that we could cover graphs before prac4/5. Note what is examinable about suffix trees, and what is not. Use the demo., and you might prefer Woolloomooloo to Mississippi.
• Discussed solutions to prac2/3(A) can be solved in O(N+M.log(N))-time and (B) in O(N.log(N)+M)-time; Q4.1 and Q4.3 tute 1 are relevant to seeing this. There were many interesting solutions submitted that are fast for uniform random data but not for correlated data, e.g. 0,1,2,3,...,N-1 (or noisy correlated data) when they typically run in O(M*N)-time. And |j-i| being small causes problems for some submitted solutions. Generalise a solution to arbitrary N and M; don't have hard-coded constants or limits. It is important to test your program well, to find "nasty" cases for your code that makes it crash, give wrong results or run inefficiently (that is 1/2-way to fixing the problem).

 ``` 0.0 0 1 1 0 0 1 1... ---------------------- 1/5 = 101 ) 1.0 0 0 0 0 0 0 0... 1 0 1 ----- 1 1 0 0 1 0 1 ----- 1 1 etc. ``` Check: 1/510 = 0.(0011).2 = 3/16 + 3/(16)2 +... (geom.series) = 3*(1/(1-1/16) - 1) = 3/15 = 1/5

### Week 11 starts Monday 16 May [prac4(B)] [tute4]

• [Numerical], floating-point computation and the problems of limited accuracy. Babbage's difference engine had (has) 30 decimal places of accuracy (>90 binary); most modern computers have only about 6 (float) or 16 (double) decimal places. Many values, e.g. 1/3 and 1/5, cannot be represented exactly in fixed-length floating-point. Numerical errors can build up during a computation. If 1.0=1.0-delta, it can be that delta>0!  [Continued...
• ...integration] by rectangle, trapezoidal and Simpson's rules.
 Recursion ~ self reference. Recursion does not need named self-reference; it can be effected with the fixed-point operator Y (Y does not refer to itself & neither do G, g, or F, but g is applied to itself), also in SML and Algol68 -- not examinable.

### Week 12 starts Monday 23 May [prac5(A)] [tute5]

 Revision: There are past exam papers, including notes, and tutorial questions under "past years" on the unit's [home page]. Also see "prequisite knowledge" and the FAQ. And [help-room] timetable s1 2005 -- consultation.

### Week 13 starts Monday 30 May [prac5(B)] [tute5]

2005 © L. Allison, Schoo1 of Computer Science and Software Engineering, Monash University, Australia 38OO.
Created with "vi (Linux & Solaris)",   charset=iso-8859-1