Search engine run on: http://users.monash.edu.au/


Glookbib search for: MolBio protein structure LAllison

%A P. R. Amarasinghe
%A L. Allison
%A P. J Stuckey
%A M. Garcia de la Banda
%A A. M. Lesk
%A A. S. Konagurthu
%T Getting 'φψχal' with proteins: minimum message length inference 
   of joint distributions of backbone and sidechain dihedral angles
%J Bioinformatics
%V 39 
%N s.1
%P i357-i367
%M JUN
%D 2023
%O ISMB/ECCB 23-27 July 2023, Lyon France
%K jrnl, MolBio, Piyumi, c2023, c202x, c20xx, zz0723, phipsichial, ISMB, ECCB,
   protein structure, angle, mixture, von Mises, vonMises, Dunbrack,
   rotamer library, backbone, side chain, sidechain, statistical model,
   MML, mdl, LAllison, AMLesk, ArunK
%X "... We model the joint distribution of the observed mainchain & sidechain 
   dihedral angles ... by a mixture of a product of von Mises probability 
   distributions. ..."
   -- [doi:10.1093/bioinformatics/btad251]['23].
   Also see AK & [LCB][2023].
   [Also search for: MolBio MML].

%A S. Rajapaksa
%A D. Sumanaweera
%A A. M. Lesk
%A L. Allison
%A P. Stuckey
%A M. Garcia de la Banda
%A P. Stuckey
%A D. Abramson
%A A. S. Konagurthu
%T On the Reliability and Limits of Protein Sequence Alignments
%J Bioinformatics
%V 38
%N s.1
%P i255–i263
%M JUL
%D 2022
%O ISMB July 2022, Madison, USA
%K conf, ISMB, MolBio, c2022, c202x, c20xx, zz0722, sequence alignment,
   protein, structure, sequence, proximity, twilight, midnight, zone,
   AMLesk, LAllison, ArunK, Sandun
%X "... Using techniques not prev. applied to these questions, by weighting
   every possible seq. alignment by its posterior prob. we derive a formal
   math. expectation, & develop an efficient alg. for computation of the
   distance between alternative alignments ... By analyzing the seqs. &
   structures of 1 million protein domain pairs, we report the variation of the
   expected distance between seq.-based & structure-based alignments, as a fn
   of (Markov time of) seq. divergence. Our results clearly demarcate the
   'daylight', 'twilight' & 'midnight' zones for interpreting residue-residue
   correspondences from seq. information alone."
   -- [doi:10.1093/bioinformatics/btac247]['22].

%A S. Rajapaksa
%A D. Sumanaweera
%A M. Garcia de la Banda
%A P. Stuckey
%A D. Abramson
%A L. Allison
%A A. Lesk
%A A. Konagurthu
%Y On identifying statistical redundancy at the level of amino acid subsequences
%J IEEE Int. Conf. on Bioinformatics and Biomedicine (BIBM)
%W Houston, USA
%M DEC
%D 2021
%K conf, BIBM, c2021, c202x, c20xx, zz0122, MolBio, LAllison, AMLesk, ArunK,
   protein, information content, stats, compress, compression, library,
   fragments, sequence, subsequence, structure, indels, source code
%X "... presents a framework to characterize & identify local sequences of
   proteins that are statistically redundant under the measure of Shannon
   information content while accounting for variations in their occurrences over
   evolutionary insertions, deletions, & substitutions of amino acids. The
   identification of such local seqs. provides insights for downstream studies
   on proteins. Here, we have applied our methods to amino acid seq. data sets
   derived from a database corr.to 935,552 substructural regions of varying
   sizes, covering 113,724 proteins from the protein data bank. The results
   identify, among others, a surjective mapping between 110,598 local seqs.
   (with an avg. length of 82 AAs/seq.) & 1,493 topological shapes. ..."
   -- [doi:10.1109/BIBM52615.2021.9669282]['22],
      [more].
   Also see supplementary@[lcb]['22].
   [Also search for: MolBio protein compression].

%A A. S Konagurthu
%A R. Subramanian
%A L. Allison
%A D. Abramson
%A P. J. Stuckey
%A M. Garcia de la Banda
%A A. M. Lesk
%T Universal architectural concepts underlying protein folding patterns
%J Front. Mol. Biosci.,
   A Journey Through 50 Years of Structural Bioinformatics in
   Memoriam of Cyrus Chothia
%P ?-?
%M APR
%D 2021
%K jrnl, FMB, MolBio, bioinformatics, c2021, c202x, c20xx, zz0121,
   procodic, prosodic, 3D protein structure, fold, folds, tableau, motif,
   motifs, pattern, model, concept, structural, terms, fragment, supersecondary,
   local, dictionary, recurring, library, MML, II, ArunK, LAllison, AMLesk
%X "What is the architectural 'basis set' of the observed universe of protein
   structures? Using information-theoretic inference, we answer this Q. with a
   comprehensive dictionary of 1,493 substructures - called concepts - at a
   sub-domain level, based on an unbiased subset of known protein structs. ...
   An interactive site, PROCODIC, at ...[prosodic] provides access to
   & navigation of the entire dictionary of concepts, & all assoc. info.."
   -- [doi: 10.3389/fmolb.2020.612920]['21],
      [more].

%A A. M. Lesk
%A A. S. Konagurthu
%A L. Allison
%A M. Garcia de la Banda
%A P. J. Stuckey
%A D. Abramson
%T Computer modelling of a potential agent against SARS-Cov-2 (COVID-19)
   protease
%J Proteins: Structure, Function and Bioinformatics
%V 88
%N 12
%P 1557-1558
%M JUL
%D 2020
%K jrnl, MolBio, c2020, c202x, c20xx, zz0720, SARS-CoV-2, SARSCoV2, protease,
   protein, Mpro, 3CLpro, nsp5, covid19, covalent, inSilico, ligand,
   inhibitor, Cys145, AMLesk, LAllison, ArunK
%X "We have modelled modifications of a known ligand to the SARS‐CoV‐2
   (COVID‐19) protease, that can form a covalent adduct, plus additional
   ligand‐protein hydrogen bonds."
   -- [doi:10.1002/prot.25980]['20]  (online 14/7/2020).
   [Also search for: SARSCoV2 protease].

%A A. M. Lesk
%A R. Subramanian
%A L. Allison
%A D. Abramson
%A P. J. Stuckey
%A M. G. de la Banda
%A K. S. Konagurthu
%T Universal architectural concepts underlying protein folding patterns
%J bioRxiv
%N 480194
%M DEC
%D 2018
%K TR, MolBio, c2018, c201x, c20xx, zz1120, AMLesk, LAllison, ArunK, protein,
   3D, structure, procodic, prosodic, fold, folding, concept, concepts,
   motif, pattern, substructure
%X "What is the architectural 'basis set' of the observed universe of protein
   structures? Using information-theoretic inference, we answer this question
   with a comprehensive dictionary of 1,493 substructural concepts. Each concept
   represents a topologically-conserved assembly of helices and strands that
   make contact. Any p.structure can be dissected into instances of concepts
   from this dictionary. We dissected the world-wide protein data bank ..."
   -- 480194@[bioRxiv]['18].

%A R. Subramanian
%A L. Allison
%A P. J. Stuckey
%A M. Garcia de la Banda
%A D. Abramson
%A A. M. Lesk
%A A. S. Konagurthu
%T Statistical compression of protein folding patterns for inference of
   recurrent substructural themes
%J Data Compression Conf. (DCC)
%I IEEE
%W Snowbird, Utah, USA
%P 340-349
%M APR
%D 2017
%K conf, DCC, MolBio, c2017, c201x, c20xx, zz0517, protein, tertiary, 3D,
   super secondary, structure, motif, motifs, pattern, blocks, library,
   discovery, description, aic, MDL, minimum message length, MML,
   LAllison, AMLesk, ArunK
%X "Computational analyses of the growing corpus of three-dimensional (3D)
   structures of proteins have revealed a limited set of recurrent substructural
   themes, termed super-secondary structures. Knowledge of super-secondary
   structures is important for the study of protein evolution and for the
   modeling of proteins with unknown structures. Characterizing a comprehensive
   dictionary of these super-secondary structures has been an unanswered
   computational challenge in protein structural studies. This paper presents an
   unsupervised method for learning such a comprehensive dictionary using the
   statistical framework of lossless compression on a database comprised of
   concise geometric representations of protein 3D folding patterns. The best
   dictionary is defined as the one that yields the most compression of the
   database. Here we describe the inference methodology and the statistical
   models used to estimate the encoding lengths. An interactive website for this
   dictionary is available ..."
   -- [more]
   &  [doi:10.1109/DCC.2017.46]['17].
   (Also see [protein].)

%A J. H. Collier
%A L. Allison
%A A. M. Lesk
%A P. J. Stuckey
%A M. Garcia de la Banda
%A A. S. Konagurthu
%T Statistical inference of protein structural alignments using information and
   compression
%J J. Bioinformatics
%I OUP
%V 33
%N 1
%P 1005-1013
%M APR
%D 2017
%O bioRxiv, June 2016
%K jrnl, OUP, MolBio, c2017, c201x, c20xx, zz0317, protein, alignment,
   tertiary structure, 3D, information, MML, MMLigner, software,
   JHC, JHCollier, ArunK, LAllison, AMLesk, AIC, bic, mdl
%X "... present here a statistical framework for the precise inference of
   structural alignments, built on the Bayesian and information-theoretic
   principle of Minimum Message Length (MML). The quality of any alignment is
   measured by its explanatory power—the amount of lossless compression achieved
   to explain the protein coordinates using that alignment. ..."
   -- [doi:10.1093/bioinformatics/btw757][2017] (online January 2017),
      [bioRxiv][6/2016],
      [more].
   (Also see MMLigner@[LCB][2016].)

%A A. S. Konagurthu
%A P. Kasarapu
%A L. Allison
%A J. H. Collier
%A A. M. Arthur
%T On sufficient statistics of least-squares superposition of vector sets
%J J. Comp. Biol.
%V 22
%N 6
%P 487-497
%M MAY
%D 2015
%K jrnl, JCB, MolBio, bioinformatics, ArunK, LAllison, AMLesk, JHC, JHCollier,
   c2015, c201x, c20xx, zz0116, 3D, point set, tertiary, protein, structure,
   structural alignment, superposition, match, matching, estimation, stats
%X "The problem of superposition of two corr. vector sets by minimizing their
   sum-of-squares error under orthogonal transformation ... can be solved
   exactly using an alg. whose time complexity grows linearly with the # of
   correspondences. ... particularly in studies involving macromolecular
   structs.. ... formally derives a set of suff.stats. for the least-squares
   superposition problem. These s. are additive. This permits a highly efficient
   (const. time) computation of superpositions (& s.stats.) of vector sets that
   are composed from its constituent v.sets under addition or deletion op.,
   where the s.stats. of the constituent sets are already known (that is, [they]
   have been previously superposed). ... a drastic improvement in the run time
   of the methods that commonly superpose v.sets under addition or deletion
   ops., where previously these ops. were carried out ab initio (ignoring the
   s.stats.). ... demonstrate the improvement our work offers in the context of
   protein structural alignment programs that assemble a reliable structural
   alignment from well-fitting (substructural) fragment pairs. A C++ library for
   this task is available online under an open-source license."
   -- [doi:10.1089/cmb.2014.0154]['16].
   (Based on the 2014 RECOMB paper.)

%A J. Collier
%A L. Allison
%A A. Lesk
%A M. Garcia de La Banda
%A A. Konagurthu
%T A new statistical framework to assess structural alignment quality using
   information compression
%J ECCB
%W Strasbourg
%M SEP
%D 2014
%K conf, ECCB 14, MolBio, c2014, c201x, c20xx, zz0914, LAllison, ArunK, AMLesk,
   JHCollier, protein, 3D, similar, structure, alignment, match, MML, MDL, AIC,
   complexity, bioinformatics, 13th Euro, Conf, Comp, Biology, I value, Ivalue
%X "... proposes a new statistical framework to assess structural alignment
   quality and significance based on lossless information compression. This is
   a radical departure from the traditional approach of formulating scoring
   functions. It links the structural alignment problem to the general class of
   statistical inductive inference problems, solved using the
   information-theoretic criterion of minimum message length. Based on this, we
   developed an efficient and reliable measure of structural alignment quality,
   I-value. The performance of I-value is demonstrated in comparison with a
   number of popular scoring functions, on a large collection of competing
   alignments. Our analysis shows that I-value provides a rigorous and reliable
   quantification of structural alignment quality, addressing a major gap in
   the field."
   -- [doi:10.1093/bioinformatics/btu460]['14],
      [more].

%A A. S. Konagurthu
%A P. Kasarapu
%A L. Allison
%A J. H. Collier
%A A. M. Lesk
%T On sufficient statistics of least-squares superposition of vector sets
%J RECOMB
%I SpringerVerlag
%S LNCS/LNBI
%V 8394
%M APR
%P 144-159
%D 2014
%K conf, RECOMB, MolBio, c2014, c201x, c20xx, zz0514, ArunK, LAllison, AMLesk,
   JHCollier, bioinformatics, RECOMB18, protein, structure, alignment,
   least squares, RMS, error, 3D, match, matching, additive, orthogonal, rigid,
   vector set, Kearsley, algorithm
%X "Superposition by orthogonal transformation of vector sets by minimizing the
   least-squares error is a fundamental task in many areas of science, notably
   in structural molecular biology. Its widespread use for structural analyses
   is facilitated by exact solns of this problem, computable in linear time.
   However, in several of these analyses it is common to invoke this
   superposition routine a very large number of times, often operating (through
   addition or deletion) on previously superposed vector sets. This paper
   derives a set of sufficient statistics for the least-squares orthogonal
   transformation problem. These sufficient statistics are additive. This
   property allows for the superposition parameters (rotation, translation, &
   root mean square deviation) to be computable as constant time updates from
   the statistics of partial solutions. We demonstrate that this results in a
   massive speed up in the computational effort, when compared to the method
   that recomputes superpositions ab initio .  Among others, protein structural
   alignment algorithms stand to benefit from our results."
   -- [doi:10.1007/978-3-319-05269-4_11]['14],
      [more].

%A A. S. Konagurthu
%A L. Allison
%A D. Abramson
%A P. J. Stuckey
%A A. M. Lesk
%T How precise are reported protein coordinate data?
%J Acta Cryst.
%V D70
%N 3
%P 904-906
%M MAR
%D 2014
%K jrnl, MolBio, c2014, c201x, c20xx, zz0314, protein, 3D, tertiary, structure,
   precision, accuracy, PDB, ArunK, LAllison, AMLesk
%X "Atomic coordinates in the Worldwide Protein Data Bank (wwPDB) are generally
   reported to greater precision than the experimental structure determinations
   have actually achieved. By using information theory & data compression to
   study the compressibility of protein atomic coordinates, it is possible to
   quantify the amount of randomness in the coordinate data & thereby to
   determine the realistic precision of the reported coordinates. On avg., the
   value of each C_alpha coordinate in a set of selected p.structures solved at
   a variety of resolutions is good to about 0.1A."
   -- [doi:10.1107/S1399004713031787]['14],
      [more].

%A A. S. Konagurthu
%A A. M. Lesk
%A D. Abramson
%A P. J. Stuckey
%A L. Allison
%T Statistical inference of protein "LEGO bricks"
%J ICDM
%M DEC
%D 2013
%K conf, ICDM, ICDM13, MolBio, c2013, c201x, c20xx, zz1213, LAllison, ArunK,
   AMLesk, protein, tertiary, 3D, MML, structure, structures, recurrent,
   structural, motifs, backbone, folds, MDL, library, dictionary, fragment,
   blocks, fragments, bioinformatics, data mining
%X "Proteins are biomolecules of life. They fold into a great variety of
   three-dimensional (3D) shapes. Underlying these folding patterns are many
   recurrent structural fragments or building blocks (analogous to "LEGO(r)
   bricks"). This paper reports an innovative statistical inference approach to
   discover a comprehensive dictionary of protein structural building blocks
   from a large corpus of experimentally determined protein structures. Our
   approach is built on the Bayesian and information-theoretic criterion of
   minimum message length [MML]. To the best of our knowledge, this work is the
   first systematic and rigorous treatment of a very important data mining
   problem that arises in the cross-disciplinary area of structural
   bioinformatics. The quality of the dictionary we find is demonstrated by its
   explanatory power - any protein within the corpus of known 3D structures can
   be dissected into successive regions assigned to fragments from this
   dictionary. This induces a novel one-dimensional representation of three-
   -dimensional protein folding patterns, suitable for application of the rich
   repertoire of character-string processing algorithms, for rapid
   identification of folding patterns of newly determined structures. This paper
   presents the details of the methodology used to infer the dictionary of
   building blocks, and is supported by illustrative examples to demonstrate its
   effectiveness and utility."
   -- [doi:10.1109/ICDM.2013.73]['14],
      [more], and
      1310.1462@[arXiv]['13].

%A A. S. Konagurthu
%A A. M. Lesk
%A L. Allison
%T Minimum message length inference of secondary structure from protein
   coordinate data
%J J. Bioinformatics
%I OUP
%V 28
%N 12
%P i97-i105
%M JUN
%D 2012
%O ISMB, Long Beach
%K conf, ISMB12, MolBio, c2012, c201x, c20xx, zz0612, LAllison, ArunK, AMLesk,
   SST, bioinformatics, protein, secondary structure, DSSP, assignment, helix,
   extended strand, sheet, coil, mmld, fold, MML, MDL, model
%X "Motivation: Secondary structure underpins the folding pattern and
   architecture of most proteins. Accurate assignment of the SS elts is
   therefore an important problem. Although many approx. solns of the SS
   assignment problem exist, the statement of the problem has resisted a
   consistent & math. rigorous defn. A variety of comparative studies have
   highlighted major disagreements in the way the available methods define &
   assign SS to coord.data.
   Results: We report a new method to infer SS based on the Bayesian method of
   Minimum Message Length (MML) inference. It treats assignments of SS as
   hypotheses that explain the given coord.data. The method seeks to maximise
   the joint probability of a hypothesis & the data. There is a natural null
   hypothesis & any assignment that cannot better it is unacceptable. We
   developed a program SST based on this approach & compared it to popular
   programs such as DSSP & STRIDE amongst others. Our evaln suggests that SST
   gives reliable assignments even on low resolution structures."
   -- [doi:10.1093/bioinformatics/bts223]['12].
   More: [www]['12].

%A A. S. Konagurthu
%A L. Allison
%A P. J. Stuckey
%A A. M. Lesk
%T Piecewise linear approximation of protein structures using the principle of
   minimum message length
%J J. Bioinformatics
%V 27
%N 13
%P i43-i51
%M JUL
%D 2011
%K conf, MolBio, MML, c2011, c201x, c20xx, zz0711, ISMB, LAllison, ArunK,
   AMLesk, protein, fold, cartoon, description, ribbon diagram, structure,
   segmentation, minimum message length, MDL, information theoretic
%X "Simple & concise representations of protein-folding patterns provide
   powerful abstractions for visualizations, comparisons, classifications,
   searching & aligning structural data. Structures are often abstracted by
   replacing standard secondary structural features - that is, helices & strands
   of sheet - by vectors or linear segments. Relying solely on std secondary
   structure may result in a sig. loss of structural information. Further,
   traditional methods of simplification crucially depend on the consistency &
   accuracy of external methods to assign SS to protein coord.data. Although
   many methods exist automatically to identify SS, the impreciseness of
   definitions, along with errors & inconsistencies in experimental structure
   data, drastically limit their applicability to generate reliable simplified
   representations, especially for structural comparison.
   This article introduces a mathematically rigorous alg. to delineate protein
   structure using the elegant statistical & inductive inference framework of
   minimum message length (MML). Our method generates consistent & statistically
   robust piecewise linear explanations of protein coordinate data, resulting in
   a powerful & concise representation of the structure. The delineation is
   completely independent of the approaches of using hydrogen-bonding patterns
   or inspecting local substructural geometry that the current methods use.
   Indeed, as is common with applications of the MML criterion, this method is
   free of parameters & thresholds, in striking contrast to the existing
   programs which are often beset by them.
   The analysis of results over a large number of proteins suggests that the
   method produces consistent delineation of structures that encompasses, among
   others, the segments corresponding to standard secondary structure."
   -- [doi:10.1093/bioinformatics/btr240]['11].
   (Also see [more].)

%A T. Edgoose
%A L. Allison
%A D. L. Dowe
%T An MML classification of protein structure that knows about angles and
   sequence
%J Pacific Symposium on Biocomputing '98
%P 585-596
%M JAN
%D 1998
%K conf, MolBio, PSB, PSB3, PSB98, von Mises, vonMises, angle, dihedral, class,
   cluster, clustering, HMM, SNOB, time series, timeSeries, ARC A49602504,
   LAllison, MDL, directional, distribution, bioinformatics, Monash,
   c1998, c199x, c19xx
%I World Scientic
%X SNOB + vonMises circular probability distribution + 1st order Markov model.
   phi-psi pairs give 17 classes and a class seq' correlation matrix.
   [paper]
   [paper]
   [paper.pdf@stanford.edu]['98]; uk us isbn:9810232780.
   von Mises, probability density:
      f(x | mu, kappa) = (1/(2.pi.I0(kappa))).exp(kappa.cos(x-mu))
                 where I0(kappa) is a normalisation constant.
   [Bioinformatics].

%A D. L. Dowe
%A L. Allison
%A T. I. Dix
%A L. Hunter
%A C. S. Wallace
%A T. Edgoose
%T Circular clustering of protein dihedral angles by minimum message length
%J Pacific Symposium on Biocomputing '96
%M JAN
%P 242-255
%D 1996
%I World Scientific
%O TR 95/237, Dept. Computer Science, Monash University, Oct 1995
%K PSB, PSB96, TR 237, TR237, Monash, DLD, CSW, CSWallace, LAllison, MolBio,
   Monash, classification, angle, von Mises, vonMises, protein structure,
   inductive inference, II, MML, MDL, conf, bioinformatics, c1996, c199x, c19xx
%X L. Hunter - NLM, NIH. PSB '96: 3-6 Jan 1996, Hawaii; uk us isbn:9810225784.
   [paper],
   [paper.ps][1/'96],
   [[eProceedings]][1/'96].

%A D. L. Dowe
%A J. Oliver
%A L. Allison
%A T. I. Dix
%A C. S. Wallace
%T Learning rules for protein secondary structure prediction
%J Proc. 1992 Department Research Conf.
%I Dept. Computer Science, University of Western Australia
%E C. McDonald
%E J. Rohl
%E R. Owens
%M JUL
%D 1992
%O TR 92/163, Dept. Computer Science, Monash University, JUN '92
%K LAllison, CSW, DLD, Monash, UWA, WA, conf, MolBio, decision tree, trees,
   graph, protein, amino acid, AA, secondary structure, SS, prediction,
   rule, rules, alpha helix, beta strand, extended sheet, coil, turn,
   CSWallace, inductive inference, II, MML, minimum message length, c1992,
   c199x, c19xx, bioinformatics, TR 92 163, TR92-163, TR163
%X [TR92/163.ps]
   Also see [Bioinformatics],
   and TR 92/163.
   [CSci UWA home]['00]; uk us isbn:0864221959.

%A D. L. Dowe
%A J. Oliver
%A T. I. Dix
%A L. Allison
%A C. S. Wallace
%T A decision graph explanation of protein secondary structure prediction
%J 26th Hawaii Int. Conf. Sys. Sci.
%V 1
%P 669-678
%M JAN
%D 1993
%K LAllison, CSW, Monash, conf, MolBio, protein secondary structure prediction,
   conformation, alpha helix, ss, AA, beta sheet extended strand, turn, coil,
   II, inductive inference, decision graph tree, DTree, CSWallace, CSW,
   MML, Minimum message length encoding, description, MDL, Bayesian,
   TR163 163, c1993, c199x, c19xx, bioinformatics, HICSS, HICSS26, HICSS93
%X Oliver and Wallace (IJCAI '91) introduced `decision graphs' -
   a generalisation of decision trees - here applied to protein secondary
   structure prediction.
   [more],
   [paper (HTML)].
   Also see TR 92/163.


Search string: