Benutzerhandbuch Datenschutzhinweis Impressum Kontakt





Annotated Alignments


Bais,  Abha Singh
Max Planck Society;

Externe Ressourcen
Es sind keine Externen Ressourcen verfügbar
Volltexte (frei zugänglich)
Es sind keine frei zugänglichen Volltexte verfügbar
Ergänzendes Material (frei zugänglich)
Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Bais, A. S. (2007). Annotated Alignments. PhD Thesis, Freie Universität Berlin, Berlin.

Elucidating the mechanisms of transcriptional regulation relies heavily on the sequence annotation of the binding sites of DNA-binding proteins called transcription factors. With the rationale that binding sites conserved across di erent species are more likely to be functional, the standard approach is to employ cross-species comparisons and focus the search to conserved regions. Usually, computational methods that annotate conserved binding sites perform the alignment and binding site annotation steps separately and combine the results in the end. If the binding site descriptions are weak or the sequence similarity is low, the local gap structure of the alignment poses a problem in detecting the conserved sites. In this thesis, we introduce a novel method that integrates the two axes of sequence conservation and binding site annotation in a simultaneous approach yielding annotated alignments – pairwise alignments with parts annotated as putative conserved transcription factor binding sites. Standard pairwise alignments are extended to include additional states for binding site profiles. A statistical framework that estimates profile-related parameters based on desired type I and type II errors is prescribed. This forms the core of the tool SimAnn. As an extension, we use existing probabilistic models to demonstrate how the framework can be adapted to consider position-specific evolutionary characteristics of binding sites during parameter estimation. This underlies the tool eSimAnn. Through simulations and real data analysis, we study the influence of considering a simultaneous approach as opposed to a multi-step one on resulting predictions. The former enables a local rearrangement in the alignment structure to bring forth perfectly aligned binding sites. This precludes the necessity of adopting post-processing steps to handle errors in pre-computed alignments, as is usually done in multi-step approaches. Additionally, the framework for parameter estimation is applicable to any novel profile of interest. Especially for instances with poor sequence conservation or profile quality, the simultaneous approach stands out. As a by-product of the analysis, we also model the annotated alignment problem as an extended pair Hidden Markov Model and illustrate the correspondence between the various theoretical concepts.