Fast computation of genome distances

Klötzl, Fabian

DetailsÜbersicht

Fast computation of genome distances

Klötzl, F. (2020). Fast computation of genome distances. PhD Thesis, Universität zu Lübeck. Institut für Neuro- und Bioinformatik., Lübeck.

Item is Freigegeben

einblenden: alle ausblenden: alle

Basisdaten

einblenden: ausblenden:

Datensatz-Permalink: https://hdl.handle.net/21.11116/0000-0007-9EAB-7 Versions-Permalink: https://hdl.handle.net/21.11116/0000-0007-9EAC-6

Genre: Hochschulschrift

Dateien

einblenden: Dateien

ausblenden: Dateien

:

thesis.digital.pdf (beliebiger Volltext), 8MB

Datei-Permalink:
-

Name:
thesis.digital.pdf

Beschreibung:
-

OA-Status:

Sichtbarkeit:
Privat

MIME-Typ / Prüfsumme:
application/pdf

Technische Metadaten:

Copyright Datum:
-

Copyright Info:
-

Lizenz:
-

Externe Referenzen

einblenden:

Urheber

einblenden:

ausblenden:

Urheber:
Klötzl, Fabian¹, Autor
Haubold, Bernhard¹, Gutachter
Tantau, Till, Gutachter

Affiliations:
1Research Group Bioinformatics, Department Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Max Planck Society, ou_1445644

Inhalt

einblenden:

ausblenden:

Schlagwörter: -

Zusammenfassung: To understand the evolutionary relationships between organisms, they are typicallypresented in a tree-like structure, a phylogeny. In genomic studies, phylogenies aretraditionally reconstructed from a multiple sequence alignment. While most accurate,this approach is also computationally demanding. The problem is that in order to identifyshared homologies, the sequences are usually first aligned nucleotide by nucleotide.This alignment step has become a bottleneck in the practice of molecular biology, wherethousands of whole bacterial genomes, each a few megabases long, are sequenced andthen need to be summarized as phylogenies when analyzing pathogen outbreaks.One alternative are methods that estimate evolutionary distances directly from un-aligned genomes. These pairwise distances can then be used to cluster sequences in atree. Most of these alignment-free methods heavily rely on exact matching techniques forwords of a fixed size for fast sequence comparison. However, they usually do not reflectthe substitution rate, the most widely used measure of evolutionary distance.Instead of using words of fixed size, Haubold et al. (2015) used matches of maximallength as anchors for approximate pairwise alignments. These anchor alignments then canbe used to estimate the substitution rate. A first implementation,andi, quickly estimatesaccurate pairwise distances from hundreds of bacterial genomes on standard hardware.However, the thousands of genomes currently being collected during outbreaks againslow the program down.Andiuses a suffix array as a full-text index for each of the input sequences. Since con-structing and searching in a suffix array is slow, the aim of this thesis was to investigate,whether it might be possible to just compute a single suffix array for one of the inputsequences and pile all remaining sequences onto that reference. This should produce anapproximate multiple sequence alignment, from which pairwise mismatches could becounted.This approach is implemented in the programphylonium(Klötzl and Haubold 2019). Itis available via package managers or as open source atgithub.com/evolbioinf/phylonium.Phyloniumis much faster thanandiwhile losing little of its predecessor ’s accuracy. In thisthesis I explain the background tophylonium, describe its implementation, and applyit to simulated and real data. In the application section I comparephyloniumto its bestcompetitors and show that it holds a reasonable position in the classical trade-off betweenspeed and accuracy.

Details

einblenden:

ausblenden:

Sprache(n): eng - English

Datum: Angenommen: 2020-10-16Erschienen: 2020-10-16

Publikationsstatus: Erschienen

Seiten: xv, 95

Ort, Verlag, Ausgabe: Lübeck : Universität zu Lübeck. Institut für Neuro- und Bioinformatik.

Inhaltsverzeichnis: -

Art der Begutachtung: -

Identifikatoren: Anderer: Diss/13341

Art des Abschluß: Doktorarbeit

Datensatz

Basisdaten

Dateien

Externe Referenzen

Urheber

Inhalt

Details

Veranstaltung

Entscheidung

Projektinformation

Quelle