Molecular and Biochemical Parasitology 118(2) pp175-186, 2001, © December 2001 Elsevier Science B.V.

Discovering patterns in Plasmodium falciparum genomic DNA

Linda Sterna, Lloyd Allisonb, Ross L. Coppelc and Trevor I. Dixb

(a) Department of Computer Science and Software Engineering, The University of Melbourne, Melbourne, Victoria 3010, Australia.
(b) School of Computer Science and Software Engineering, Monash University, Clayton, Victoria 3800, Australia.
(c) Department of Microbiology, Monash University, Clayton, Victoria 3800, Australia.

Should redirect, but if not: [doi:10.1016/S0166-6851(01)00388-7][5/'03] with full text in pdf.

home1 home2
 Bib
 Algorithms
 Bioinfo
 FP
 Logic
 MML
 Prog.Lang
and the
 Book

Bioinformatics
 Compression

Abstract: A method has been developed for discovering patterns in DNA sequences. Loosely based on the well-known Lempel Ziv model for text compression, the model detects repeated sequences in DNA. The repeats can be forward or inverted, and they need not be exact. The method is particularly useful for detecting distantly related sequences, and for finding patterns in sequences of biased nucleotide composition, where spurious patterns are often observed because the bias leads to coincidental nucleotide matches. We show here the utility of the method by applying it to genomic sequences of Plasmodium falciparum. A single scan of chromosomes 2 and 3 of P. falciparum, using our method and no other a priori information about the sequences, reveals regions of low complexity in both telomeric and central regions, long repeats in the subtelomeric regions, and shorter repeat areas in dense coding regions. Application of the method to a recently sequenced contig of chromosome 10 that has a particularly biased base composition detects a long internal repeat more readily than does the conventional dot matrix plot. Space requirements are linear, so the method can be used on large sequences. The observed repeat patterns may be related to large-scale chromosomal organization and control of gene expression. The method has general application in detecting patterns of potential interest in newly sequenced genomic material.

Keywords: Compression; information theory; repeated sequences; pattern discovery.

Coding Ockham's Razor, L. Allison, Springer

A Practical Introduction to Denotational Semantics, L. Allison, CUP

Linux
 Ubuntu
free op. sys.
OpenOffice
free office suite
The GIMP
~ free photoshop
Firefox
web browser

© L. Allison   http://www.allisons.org/ll/   (or as otherwise indicated),
Faculty of Information Technology (Clayton), Monash University, Australia 3800 (6/'05 was School of Computer Science and Software Engineering, Fac. Info. Tech., Monash University,
was Department of Computer Science, Fac. Comp. & Info. Tech., '89 was Department of Computer Science, Fac. Sci., '68-'71 was Department of Information Science, Fac. Sci.)
Created with "vi (Linux + Solaris)",  charset=iso-8859-1,  fetched Friday, 29-Mar-2024 18:50:47 AEDT.