[Subject home], [FAQ], [progress],   Bib', Alg's,  C , Java - L.A., Sunday, 28-Nov-2021 04:37:06 AEDT
Instructions: Topics discussed in these lecture notes are examinable unless otherwise indicated. You need to follow instructions, take more notes & draw diagrams especially as [indicated] or as done in lectures, work through examples, and do extra reading. Hyper-links not in [square brackets] are mainly for revision, for further reading, and for lecturers of the subject.

Design Techniques: Introduction

Here, some strategies to (try to) improve and analyse an algorithm:

• Do not repeat work.

• Do only what is necessary
• e.g. Kruskal's MST alg' needs "short" edges 1st, so use a priority Q rather than sorting all edges - improves constant.

• Try alternative representations of data & of data structures.

• "Balance" work.

Example, edit distance

• Holding partial results in D[ , ] matrix gives the O(n2 )-time[*]  dynamic programming algorithm.

• Can do better, e.g.

• O( n . d )-time worst-case,   O( n + d2 )-time average-case

• where d = distance(string1, string2)

• Ukkonen (1983)   and Miller & Myers (1988)   see Bib' if interested

• Very fast when [___________________]

• Note that diagonals of the   D[,]   matrix are non-decreasing . . .
. . . continued.   D[,] & U[,] (D's contours) are equivalent, i.e. alternative representations

 NB. Ukkonen's alg' is not examinable, although the design ideas are.

base case:
U[0, 0] = 0

general case:
U[dg, c] = max(

U[dg+1, c-1],

U[dg,     c-1] + 1,

U[dg-1, c-1] + 1 ) ;

while S1[ U[dg, c] ] = S2[ U[dg, c] - dg ] do
U[ dg, c ] ++
end_while

Iterate over diagonal `dg' within iterating over cost `c'.   NB. | dg | <= c.   Watch out for boundary conditions. Edit distance = min' c such that U[ |S1| - |S2|,  c ] = |S1|

• O( d2 )

Time

• Worst case:   O( n * d )

• Average case:   O( n + d2 )

• Best case:   O( n ) --when strings equal

where n is string length, d is edit distance.   Because [________________].

A Balanced Outlook

Example: Indexed sequential file
```I1| I2| I3| ... Im|
|   |   |       |
|   |   |       |
|   |   |       ---------------------|
|   |   |                            |
|   |   -----------|                 |
|   |              |                 |
|   -----|         |                 |
|        |         |                 |
|        |         |                 |
v        v         v                 v
e1 e2 e3 ....  ......   .....        en
```
• file of `n' element,   index of `m' elements

• How to choose m ?

Assuming that fetch and search, (a) on index and (b) on block pointed to, are proportional to size (with the same constant) :

• total time is proportional to m + n / m

differentiate w.r.t. m :

• 1 - n / m2

set to zero to find minimum time:

• n / m2 = 1

• m2 = n,     i.e.   m = sqrt( n )

index size and block size should be equal.

Balance in Divide and Conquer

• try to divide into two equal amounts of work

• e.g. merge-sort always O(n . log(n))-time

[why____________________?]

• quick-sort, depends on estimates of median

• good estimates => O(n . log(n))-time

• poor estimates => O(n2)-time

[why____________________?]

Greed is Good (sometimes):

• when we must make a number of choices for a complete solution

• greedy strategy is to make a "local" choice based on current information, without back-tracking.

Sometimes gives optimal solution, e.g.
• Dijkstra's single source shortest paths algorithm

• Prim's minimum spanning tree algorithm

• Kruskal's minimum spanning tree algorithm.

Sometimes gives a "good" solution to a (combinatorial) problem even if not guaranteed optimal (-- heuristic).

Estimating Complexity

When complexity is hard to analyse, can do empirical tests.

If T(n) = adnd + ad-1nd-1 + . . . + . . . a0 then:

• log( T(n) ) ~ log( a . nd ),   as n--->∞

 lecturer: plot log(T(n)) v. log(n); class: take notes
• = log(a) + d . log( n )

• the intercept is [________]

• the slope is [________]

NB. Must cover a large range of `n'.   e.g. 1000*n exceeds n2 unless [__________].

. . . estimating complexity.   If you believe or suspect that T(n) is THETA(f(n)) then:

• time, count, instrument, and . . .

• plot T(n)/f(n) v. n as n--->∞

• ratio can
1. ---> c     --T(n) is [__________]

2. ---> 0     --T(n) is [__________]

3. ---> ∞     --T(n) is [__________]

4. oscillate.     Also see Weiss p17.
NB. Must cover a large range of `n'. e.g. . . .
(e.g. an algorithm believed O(d**3)-time)

Design Techniques:   Summary

• Don't repeat work (store for re-use).

• Look for alternative representations.