Modelling-Alignment for Non-Random Sequences (AI2004)

David R. POWELL, Lloyd ALLISON, & Trevor I. DIX

home1 home2
 Bib
 Algorithms
 Bioinfo
 FP
 Logic
 MML
 Prog.Lang
and the
 Book

Bioinformatics
 Alignment
  AI2004
   software
   
Also see:
 Comp.J.'99
 Compression
MML

AI2004, Springer Verlag, LNCS Vol.3339, pp.203-214, 2004.

Abstract. Populations of biased, non-random sequences may cause standard alignment algorithms to yield false-positive matches and false-negative misses. A standard significance test based on the shuffling of sequences is a partial solutions, applicable to populations that can be described by simple models. Masking-out low information content intervals throws information away. We describe a new and general method, modelling alignment: Population models are incorporated into the alignment process, which can (and should) lead to changes in the rank-order of matches between a query sequence and a collection of sequences, compared to results from standard algorithms. The new method is general and places very few conditions on the nature of the models that can be used with it. We apply modelling-alignment to local alignment, global alignment, optimal alignment and the relatedness problem.

Results: As expected, modelling-alignment and the standard PRSS program from the FASTA package have similar accuracy on sequence populations that can be described by simple models, e.g. 0-order Markov models. However, modelling-alignment has higher accuracy on populations that are mixed or that are described by higher-order models: It gives fewer false positives and false negatives as show by ROC curves and other results from tests on real and artificial data.

Availability: An implementation of the software is available via the Web [see top left].

Partially funded by Australian Research Council (ARC) grant A49800558.

Paper:
[link]['04].
Preprint: [PP.ps]
Also see: [Comp.J.'99] and seminars [1], [2].

ROC curves
e.g. ROC curve
Coding Ockham's Razor, L. Allison, Springer

A Practical Introduction to Denotational Semantics, L. Allison, CUP

Linux
 Ubuntu
free op. sys.
OpenOffice
free office suite
The GIMP
~ free photoshop
Firefox
web browser

© L. Allison   http://www.allisons.org/ll/   (or as otherwise indicated),
Faculty of Information Technology (Clayton), Monash University, Australia 3800 (6/'05 was School of Computer Science and Software Engineering, Fac. Info. Tech., Monash University,
was Department of Computer Science, Fac. Comp. & Info. Tech., '89 was Department of Computer Science, Fac. Sci., '68-'71 was Department of Information Science, Fac. Sci.)
Created with "vi (Linux + Solaris)",  charset=iso-8859-1,  fetched Friday, 29-Mar-2024 21:18:33 AEDT.