^CSE2304^   [plan]

CSE2304, semester 1, 2006

 Notes index: [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [22], [24] NB. Numbers do not necessarily match lecture numbers exactly. A printed version of the notes is available at the bookshop.

Will change. Check regularly!

Lecture [timetable] shows Tuesday 11.00am (24/S5), Wednesday 2.00pm (11/H1), as of 12 February.
Tutorials: for impenetrable reasons, allocate+ may refer to tutorials as 1-hour practicals 'mn-P2'.
Consultation: [Help-room] or after the Tuesday and Thursday lectures @ the lecture theatres; also see `when and where' on the [home-page].
Book: The bookshop has supplies of the recommended text book as of 27-Feb, but there may be a gap in supplies until mid March.

Week 1, starts Monday 27 February

• [1, Introduction]
"[CSE2304], [FAQ], [progress] & [plan], Bib', Alg's,  C
Instructions: Topics discussed in these lecture notes are examinable unless otherwise indicated. You need to follow instructions, take more notes & draw diagrams especially as [indicated] or as done in lectures, work through examples, and do extra reading. Hyper-links not in [square brackets] are mainly for revision, for further reading, and for lecturers of the subject."

• [2, Lists and trees]: An abstract datatype, such as List or Tree, specifies certain values, operations and rules on them that a correct implementation must satisfy. We can do algebra on abstract datatypes and algorithms, for example proving that List [append] is associative. A proof is much stronger than a test. We also analyse algorithms, for example the slow Tree [flatten] algorithm has O(n2) worst-case time-complexity; the fast [flatten] algorithm has O(n) time-complexity in all cases. Some of the tutorial questions practice proof and analysis.

Week 2, starts Monday 6 March, [Prac0]

 From Admin: "[The] practical on Thurs at 3pm (05-P1) & Wed at 1pm (05-P2) has been cancelled. [If affected] please log in to Allocate+ & choose an alternative time. We apologise for any inconvenience." 6/3/06
• [3, Verification] of an algorithm: (i) Prove it correct if it terminates, and (ii) prove that it terminates. There are general templates for if and while, the latter makes use of an invariant. As an example, we applied this technique to the binary-search algorithm. By the way, adding a short-cut test for key==a[mid] in the binary-search algorithm slows it down on average.
• [4, Simple sorts] selection sort (SS) and insertion sort (IS), as examples for verification and for examination of algorithm design. Note that the elements to be sorted are a[1..N] here. Weakening the invariant of SS's main-loop leads to IS which is about 2× faster on average, and even faster in the best-case. In IS, a[0]=-maxint (you can only use maxint when sorting ints) is an example of a sentinel. It ensures that the inner loop of IS terminates even if ai is smaller than all values in a[1..i-1]. Both SS and IS take O(1)-space; the array's space is not charged to the sort. IS takes O(N)-time in the best-case - when a[] is already sorted.

Week 3, starts Monday 13 March, [Tute1]

 The algorithm at [Wff] can be used to check answers to tute1 Q1. You should have prepared, or be preparing, [prac1] for next week.
• [Heaps], priority queue and heap sort, O(n.log(n))-time. Note the relationship between selection sort (SS) and heap sort (HS): SS finds the smallest (could be largest) element in the unsorted section of the array and adds the element to the sorted section. HS is rather similar but finds the element* more quickly (log(n) v. linear time, for each one) by using a heap. HS does about 3n/2 of these operations. Note the pre- and post-conditions on downHeap.
*As it is usually coded, HS repeatedly finds the largest "remaining" element.
• [More fast sorts]: merge sort and quick sort are both examples of the divide and conquer paradigm. The former could also be called a "post-order" algorithm and the latter a "pre-order" algorithm. Note the invariant in Wirth's(?) efficient version of the partition operation. The Dutch National Flag problem yields a simpler partition operation that is slower by a constant factor.

Week 4, starts Monday 20 March, [Prac1]

• [Edit distance] (ED) and the dynamic programming paradigm (DP). The basic ED algorithm takes O(|A|*|B|) time and space (O(min(|A|,|B|))-space if only the value of the distance is needed), and is an instance of a 2-dimensional DP Algorithm (DPA). We will see 1-dimensional DPAs later, computing shortest paths and minimum spanning trees of graphs
• [Hirschberg's] space-efficient edit distance algorithm trades off time complexity (×2) to reduce space complexity, O(n2)->O(n). It is an instance of divide and conquer, as well as dynamic programming.
 Time to start preparing [Prac2].

Week 5, starts Monday 27 March, [Tute2]

• [Lookup table] abstract data type, can be implemented by (i) unsorted array, (ii) sorted array, (iii) hashing (linear probing, quadratic probing, etc.), (iv) binary search tree (BST), and in many other ways. (It is not possible to say what is the best implementation of a lookup table for all situations -- it depends on how it is used and on the statistics of what is put into the table.) Continued . . .
• . . . finish BST.
[AVL trees]: if a tree was `weight balanced' then search would be O(log(n))-time at worst but this invariant cannot be maintained in O(log(n))-time by insertion and deletion. However `height balance' can be maintained in O(log(n))-time. [Practice with this [demo].]
 Solutions to tute-1 and tute-2 are on the 2nd-year notice board, bldg-75 --4/4/2006. Some [feedback] on prac-1.

Week 6, starts Monday 3 April, [Prac2]

• [Tries] and PATRICIAs: search, insert (and delete) a string, s, in O(|s|)-time regardless of the number of strings stored in the structure.
• [2-3, 2-3-4-, B-] trees implement lookup-tables: Search and insert in O(logk n)-time. 2-3- leads to 2-3-4- which leads to B-tree, the last being a very important data structure. Making the "fan-out", k, larger makes the height of a B-tree smaller which makes the table operations faster, particularly when data is stored on disc.

Week 7, starts Monday 10 April, [Tute3]

 11/4/2006 Google has bough the rights to an [.au algorithm] from the UNSW School of Computer Science. And, of course, google is famous for its [PageRank] algorithm.
• [String searching], search for pat[1..m] in txt[1..n], m<=n, and usually m<<n. Naive algorithm O(m*n)-time worst-case, Rabin's algorithm O(n)-time almost always (but O(m*n)-time in the rare worst-case), Boyer-Moore algorithm O(n/m)-time on average if m<<|alphabet| and O(n)-time worst-case.
• [Graphs]: The main types of graph are {undirected | directed} × {unweighted | weighted}. The main ways of representing a graph in a computer are by an [adjacency matrix] or by [adjacency lists]. The former is space-efficient for a dense graph the latter for a sparse graph. (The world wide web forms a very large graph, vertex~page, edge~hyper-link; it is sparse.)

Week 8, starts Monday 24 April [no prac or tute this week]

 Hints to prac3 may be added [here(click)]. [Tute5] has been revised.
• Anzac Day, Tuesday 25 April
• [Path problems] in weighted directed graphs: [Dijkstra's] single-source shortest paths algorithm, and [Warshall's] transitive closure (connectedness) algorithm, continued . . .

Week 9, starts Monday 1 May, [Prac3]

• . . . [continued] with Floyd's all-pairs shortest paths (min,+) algorithm is very similar to Warshall's (or,and) transitive-closure algorithm, in fact it is possible to substitute other operators into its structure.
[Prim's algorithm] for the minimum spanning tree of a weighted undirected graph. Note the similarity of Dijkstra's paths alg. and Prim's MST alg.. Their time-complexities, as given for adjacency matrices, are O(|V|2) (also see Q2, t2).
• [Kruskal's] algorithm for the minimum spanning tree of a weighted undirected graph uses a priority-Q, runs in O(|E|*log|E|)-time, is faster than Prim's alg. for a sparse graph when |E|<<|V|2, but slower for a dense graph if |E|~|V|2.

Week 10, starts Monday 8 May, [Tute4]

 The top- and medium-level [plan's progress] as of week-10. [Seminar] Computational Methods to Understand Cancer, Sarah Boyd, Thurs, 1pm, 75/G55.
• [DAGs], directed acyclic graphs, are "like" trees in that they have no cycles, but unlike them in that a vertex can have more than one "parent". Suitable variations on depth first traversal of a DAG can be used to find (i) a topological-sort of the DAG and (ii) a critical-path of a vertex-weighted DAG.
• [Suffix tree] for txt[1..n] can be built in O(n) time and space. It can be used to search for pat[1..m] within txt in O(m) time. This is efficient if many searches are made on the one txt. A suffix tree can also find (i) longest repeated substring within txt, (ii) longest common substring (not subsequence) of txt1 and txt2, and (iii) longest palindrome within txt, each in linear time. [Try the demo; good examples include Woolloomooloo and abracadabra. And for interest only, [suffix arrays] are an alternative to suffix trees.]

Week 11, starts Monday 15 May, [Prac4]

•  Thurs 18th, 1pm, 75/G55: [Impossible Nature] by Jon McCormack. All invited.
[Re prac3: A range query, Q4 [tute3] on a [BST] can find earlier fragments that overlap a new fragment to the left, or to the right, or that it contains. But to find fragments that contain the new fragment needs a conjunction of two range queries which can be slow. However, grouping fragments by length places limits on where their left ends must lie if they are to contain the new fragment which can be used to reduce false positives. The grouping would have to be done adaptively as the set grows.
Alternatively, a BST can be augmented so that each node also holds the maximum (minimum) right (left) value of both its subtrees; google "interval trees". (If data contains correlations, a [balanced tree] would be necessary.) At least one solution managed to use a [Trie] successfully!]

[Numerical computing] 1/2=0.510=0.12, 1/4=0.2510=0.012, 1/8=0.12510=0.0012, 1/16=0.062510=0.00012, 5/8=0.1012. Note, 1/1010 = 1/10102 = 0.(00112). cannot be represented exactly in fixed- or floating-point binary with a limited number of digits. Numerical (in)accuracy. Solving f(x)=0. Continued . . .
• . . . [numerical integration] of f(x) by the rectangle, trapezoidal, and Simpson's rules.
[Recursion] is related to self-reference. Some fast algorithms are naturally recursive e.g. fast multiplication of integers and matrices. Continued . . .

Week 12, starts Monday 22 May, [Tute5]

 Thurs 25th, 1pm, 75/G55: [Prior Convictions] by Steve Gardner. All invited.
• . . . [recursion] although 3-way and even 7-way recursion occur they are rare; [linear-], binary- and n-ary-recursion are more common. Under certain assumptions we can give the time-complexity of linear recursion. To remove the recursion you have to introduce a stack, in general. But provided that there is no s2(x) after the recursive call, linear recursion can be replaced by a simple loop. Continued . . .
• . . . binary recursion, n-ary recursion, and the n-queens problem as an example application of the latter.
[Design principles] . . .

Week 13, starts Monday 29 May, [Prac5]

• [Design principles]: Only compute what is needed (e.g. use a heap/ priority-Q for edges in Kruskal's MST). Do not compute anything more than once (if there is room to store it for reuse). Look for invariant conditions that can be exploited (e.g. selection sort ->insertion sort, or ->heap sort), and D[]->U[] for edit-dist.). Balance work (usually). Try well known strategies such as divide and conquer (e.g. merge-sort, quick-sort, Hirschberg), dynamic programming (e.g. edit-distance, shortest paths, Prim's MST), greedy strategy (e.g. shortest paths, P's MST, K's MST).  Thurs 1st, 1pm, 75/G55: [Computers & Life] by Alan Dorin. All invited.
• Finished: No Wednesday lecture.

Exam

• Past examination papers can be found under "past years" on the unit's [home-page].
• Weeks +1 and +2, exam period help, (i) [help-room], (ii) LA, Tues. 9.30-12 and Wed. 9.30-11.
• (Regardless of hurdles, the advice is always: Sit the exam.)
• Do not write in red, or similar colours, in your exam answers!
• There are six questions on the paper, so spend about 1/2-hour on each question. Each question is worth 20 marks. Attempt all questions. Your total mark will be the sum of your four best answers plus half of your two other answers.
• Closed book: no books, no calculators, etc..

B.Comp.Sci., B.Dig.Sys., B.Software Eng., B.Sci., © 2006 L.Allison, Faculty of Information Technology (Clayton School), Monash University, Australia 3800.
Created with "vi (Linux & Solaris)",   charset=iso-8859-1