[Subject home], [FAQ], [progress],   Bib', Alg's,  C , Java - L.A., Sunday, 28-Nov-2021 04:33:08 AEDT
Instructions: Topics discussed in these lecture notes are examinable unless otherwise indicated. You need to follow instructions, take more notes & draw diagrams especially as [indicated] or as done in lectures, work through examples, and do extra reading. Hyper-links not in [square brackets] are mainly for revision, for further reading, and for lecturers of the subject.

## String Searching:   Suffix Tree:   Introduction

• Recall String searching for a `pattern' in a `text'.

• Good algorithms take O( | text | )-time.

• It turns out that we can do search in   O( | pattern | )-time   if we spend some pre-processing time to build an index of some kind,

• e.g. a suffix tree . . .

### Definitions,   given txt[1..n]

• A prefix is a substring   txt[1..j]

• A suffix is a substring   txt[i..n]

• A substring is   txt[i..j],   so

• a substring is a prefix of a suffix   (or equivalently a [________] of a [________].)

• There are   n * (n+1) / 2   non-empty substrings of txt but

• an "index" of the n suffices is enough to find any substring

### e.g. Consider the suffices of   txt[1..n] = ``mississippi'' :

(or maybe ``Woolloomooloo'')
```T1  = mississippi = txt
T2  = ississippi
T3  = ssissippi
T4  = sissippi
T5  = issippi
T6  = ssippi
T7  = sippi
T8  = ippi
T9  = ppi
T10 = pi
T11 = i
```

### Could sort (pointers to) suffixes

```T11 = i
T8  = ippi
T5  = issippi
T2  = ississippi
T1  = mississippi
T10 = pi
T9  = ppi
T7  = sippi
T4  = sissippi
T6  = ssippi
T3  = ssissippi
```
Sort at least O(n.log(n))-time,   search at least O(log(n))-time. Question: What would be worst case?

### Suffix Tree

• is a special data structure

• a tree, edges labelled with substrings

• is related to Trie and PATRICIA

• can be built, right to left, in O(n)-time (Weiner 1973, McCreight 1976) and

• left to right, in O(n)-time (Ukkonen 1992, 1995)

• which is good because [________________________________]

• O(n)-space, but a large constant, so research into variations and alternatives is still current.
suffix tree for mississippi   [lecturer: do a search; class: take notes!]
```tree-->|---mississippi
|
|---i-->|---ssi-->|---ssippi
|       |         |
|       |         |---ppi
|       |
|       |---ppi
|
|---s-->|----si-->|---ssippi
|       |         |
|       |         |---ppi
|       |
|       |----i--->|---ssippi
|                 |
|                 |---ppi
|--p-->|---pi
|
|---i
```
[lecturer: consider building an ST for a different string e.g. ``Woolloomooloo'']

### How to build a Suffix Tree?

```           tree1

tree-->----m...

```
Adding the second character to get `mi' there are now suffixes `mi' and `i':
```           tree2

tree-->|---mi...
|
|---i...
```

Next `mis'

```tree-->|---mis...                       -- tree3
|
|---is...
|
|---s...
```

There is no need to add any more splits for `miss' because `s' is part of `ss'.

```tree-->|---miss...                      -- tree4
|
|---iss...
|
|---ss...
```

However, with `missi' there must be a split . . .

. . . split because one `s' is followed by `i', the other by `s'

```tree-->|---missi...                     -- tree5
|
|---issi...
|
|---s-->|---si...
|
|---i...
```
[lecturer: demonstrate uses; class: take notes!]

### Uses of Suffix Tree

• Search for pat[1..m] in txt[1..n]

• preprocessing O(n)-time

• search O(m)-time

• Longest repeated substring in txt.

• Longest common substring of txt1 and txt2.

• Longest palindrome in txt.

## String Searching:   Suffix Tree:   Summary

• Can be built in O(n)-time (Weiner 1973, McCreight 1976, Ukkonen 1992, 1995; see bib')

• search for pat[1..m] then takes only O(m)-time.

• Compare with other String Search methods.

•  NB. Algorithm to build a S.T. in O(n)-time is not examinable (but to build a S.T. as above "by hand", and applications of S.T. are examinable).
© L.Allison, Sunday, 28-Nov-2021 04:33:08 AEDT users.monash.edu.au/~lloyd/tilde/CSC2/CSE2304/Notes/19suffix.shtml
Computer Science, Software Engineering, Science (Computer Science).