----- NEEDS JAVASCRIPT ON. -----
[Subject home ],
[FAQ ],
[progress ],
Bib' ,
Alg's ,
C ,
Java
- L.A. ,
Friday, 29-Mar-2024 07:30:32 AEDT
Instructions :
Topics discussed in these lecture notes are examinable
unless otherwise indicated.
You need to follow instructions,
take more notes &
draw diagrams especially as [indicated] or
as done in lectures,
work through examples, and
do extra reading.
Hyper-links not in [square brackets] are mainly for revision,
for further reading, and for lecturers of the subject.
String Searching:
Suffix Tree :
Introduction
Recall
String
searching for a `pattern' in a `text'.
Good algorithms take O( | text | )-time.
It turns out that we can do search in
O( | pattern | )-time if we spend some
pre-processing time to build an index of some kind,
e.g. a suffix tree . . .
Definitions, given txt[1..n]
A prefix is a substring txt[1..j]
A suffix is a substring txt[i..n]
A substring is txt[i..j], so
a substring is a prefix of a suffix
(or equivalently a [________] of a [________].)
There are
n * (n+1) / 2 non-empty substrings of txt but
an "index" of the n suffices is enough to find any substring
e.g. Consider the suffices of txt[1..n] = ``mississippi'' :
(or maybe ``Woolloomooloo '')
T1 = mississippi = txt
T2 = ississippi
T3 = ssissippi
T4 = sissippi
T5 = issippi
T6 = ssippi
T7 = sippi
T8 = ippi
T9 = ppi
T10 = pi
T11 = i
Could sort (pointers to ) suffixes
T11 = i
T8 = ippi
T5 = issippi
T2 = ississippi
T1 = mississippi
T10 = pi
T9 = ppi
T7 = sippi
T4 = sissippi
T6 = ssippi
T3 = ssissippi
Sort at least O(n.log(n))-time,
search at least O(log(n))-time.
Question: What would be worst case?
is a special data structure
a tree, edges labelled with substrings
is related to Trie and PATRICIA
can be built, right to left, in O(n)-time
(Weiner 1973, McCreight 1976) and
left to right, in O(n)-time (Ukkonen 1992, 1995)
which is good because [________________________________]
O(n)-space, but a large constant, so research into
variations and alternatives is still current.
suffix tree for mississippi
[lecturer: do a search; class: take notes!]
tree-->|---mississippi
|
|---i-->|---ssi-->|---ssippi
| | |
| | |---ppi
| |
| |---ppi
|
|---s-->|----si-->|---ssippi
| | |
| | |---ppi
| |
| |----i--->|---ssippi
| |
| |---ppi
|--p-->|---pi
|
|---i
[lecturer: consider building an ST for a different string
e.g. ``Woolloomooloo '']
tree1
tree-->----m...
Adding the second character to get `mi'
there are now suffixes `mi' and `i':
tree2
tree-->|---mi...
|
|---i...
Next `mis'
tree-->|---mis... -- tree3
|
|---is...
|
|---s...
There is no need to add any more splits for `miss'
because `s' is part of `ss'.
tree-->|---miss... -- tree4
|
|---iss...
|
|---ss...
However, with `missi' there must be a split . . .
. . . split because one `s' is followed by `i', the other by `s'
tree-->|---missi... -- tree5
|
|---issi...
|
|---s-->|---si...
|
|---i...
[lecturer: demonstrate uses; class: take notes!]
Search for pat[1..m] in txt[1..n]
preprocessing O(n)-time
search O(m)-time
Longest repeated substring in txt.
Longest common substring of txt1 and txt2 .
Longest palindrome in txt.
String Searching:
Suffix Tree :
Summary
Can be built in O(n)-time
(Weiner 1973, McCreight 1976, Ukkonen 1992, 1995;
see bib' )
search for pat[1..m] then takes only O(m)-time.
Compare with other
String Search
methods.
NB. Algorithm to build a S.T. in O(n)-time
is not examinable
(but to build a S.T. as above "by hand", and
applications of S.T. are examinable).
© L.Allison ,
Friday, 29-Mar-2024 07:30:32 AEDT
users.monash.edu.au/~lloyd/tilde/CSC2/CSE2304/Notes/19suffix.shtml
Computer Science,
Software Engineering,
Science (Computer Science).