Deutsch
 
Hilfe Datenschutzhinweis Impressum
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT
  Fast computation of genome distances

Klötzl, F. (2020). Fast computation of genome distances. PhD Thesis, Universität zu Lübeck. Institut für Neuro- und Bioinformatik., Lübeck.

Item is

Basisdaten

einblenden: ausblenden:
Genre: Hochschulschrift

Dateien

einblenden: Dateien
ausblenden: Dateien
:
thesis.digital.pdf (beliebiger Volltext), 8MB
 
Datei-Permalink:
-
Name:
thesis.digital.pdf
Beschreibung:
-
OA-Status:
Sichtbarkeit:
Privat
MIME-Typ / Prüfsumme:
application/pdf
Technische Metadaten:
Copyright Datum:
-
Copyright Info:
-
Lizenz:
-

Externe Referenzen

einblenden:

Urheber

einblenden:
ausblenden:
 Urheber:
Klötzl, Fabian1, Autor           
Haubold, Bernhard1, Gutachter           
Tantau, Till, Gutachter
Affiliations:
1Research Group Bioinformatics, Department Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Max Planck Society, ou_1445644              

Inhalt

einblenden:
ausblenden:
Schlagwörter: -
 Zusammenfassung: To understand the evolutionary relationships between organisms, they are typicallypresented in a tree-like structure, a phylogeny. In genomic studies, phylogenies aretraditionally reconstructed from a multiple sequence alignment. While most accurate,this approach is also computationally demanding. The problem is that in order to identifyshared homologies, the sequences are usually first aligned nucleotide by nucleotide.This alignment step has become a bottleneck in the practice of molecular biology, wherethousands of whole bacterial genomes, each a few megabases long, are sequenced andthen need to be summarized as phylogenies when analyzing pathogen outbreaks.One alternative are methods that estimate evolutionary distances directly from un-aligned genomes. These pairwise distances can then be used to cluster sequences in atree. Most of these alignment-free methods heavily rely on exact matching techniques forwords of a fixed size for fast sequence comparison. However, they usually do not reflectthe substitution rate, the most widely used measure of evolutionary distance.Instead of using words of fixed size, Haubold et al. (2015) used matches of maximallength as anchors for approximate pairwise alignments. These anchor alignments then canbe used to estimate the substitution rate. A first implementation,andi, quickly estimatesaccurate pairwise distances from hundreds of bacterial genomes on standard hardware.However, the thousands of genomes currently being collected during outbreaks againslow the program down.Andiuses a suffix array as a full-text index for each of the input sequences. Since con-structing and searching in a suffix array is slow, the aim of this thesis was to investigate,whether it might be possible to just compute a single suffix array for one of the inputsequences and pile all remaining sequences onto that reference. This should produce anapproximate multiple sequence alignment, from which pairwise mismatches could becounted.This approach is implemented in the programphylonium(Klötzl and Haubold 2019). Itis available via package managers or as open source atgithub.com/evolbioinf/phylonium.Phyloniumis much faster thanandiwhile losing little of its predecessor ’s accuracy. In thisthesis I explain the background tophylonium, describe its implementation, and applyit to simulated and real data. In the application section I comparephyloniumto its bestcompetitors and show that it holds a reasonable position in the classical trade-off betweenspeed and accuracy.

Details

einblenden:
ausblenden:
Sprache(n): eng - English
 Datum: 2020-10-162020-10-16
 Publikationsstatus: Erschienen
 Seiten: xv, 95
 Ort, Verlag, Ausgabe: Lübeck : Universität zu Lübeck. Institut für Neuro- und Bioinformatik.
 Inhaltsverzeichnis: -
 Art der Begutachtung: -
 Identifikatoren: Anderer: Diss/13341
 Art des Abschluß: Doktorarbeit

Veranstaltung

einblenden:

Entscheidung

einblenden:

Projektinformation

einblenden:

Quelle

einblenden: