Deutsch
 
Hilfe Datenschutzhinweis Impressum
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT
  Phylogeny-aware identification and correction of taxonomically mislabeled sequences

Kozlov, A., Zhang, J., Yilmaz, P., Glockner, F., & Stamatakis, A. (2016). Phylogeny-aware identification and correction of taxonomically mislabeled sequences. Nucleic Acids Research (London), 44(11): 11, pp. 5022-5033.

Item is

Basisdaten

einblenden: ausblenden:
Genre: Zeitschriftenartikel

Dateien

einblenden: Dateien
ausblenden: Dateien
:
Yilmaz_2016.pdf (Verlagsversion), 4MB
Name:
Yilmaz_2016.pdf
Beschreibung:
-
OA-Status:
Sichtbarkeit:
Öffentlich
MIME-Typ / Prüfsumme:
application/pdf / [MD5]
Technische Metadaten:
Copyright Datum:
-
Copyright Info:
-
Lizenz:
-

Externe Referenzen

einblenden:

Urheber

einblenden:
ausblenden:
 Urheber:
Kozlov, A., Autor
Zhang, J., Autor
Yilmaz, P.1, Autor           
Glockner, F.1, Autor           
Stamatakis, A., Autor
Affiliations:
1Microbial Genomics Group, Department of Molecular Ecology, Max Planck Institute for Marine Microbiology, Max Planck Society, ou_2481697              

Inhalt

einblenden:
ausblenden:
Schlagwörter: -
 Zusammenfassung: Molecular sequences in public databases are mostly annotated by the submitting authors without further validation. This procedure can generate erroneous taxonomic sequence labels. Mislabeled sequences are hard to identify, and they can induce downstream errors because new sequences are typically annotated using existing ones. Furthermore, taxonomic mislabelings in reference sequence databases can bias metagenetic studies which rely on the taxonomy. Despite significant efforts to improve the quality of taxonomic annotations, the curation rate is low because of the labor-intensive manual curation process. Here, we present SATIVA, a phylogeny-aware method to automatically identify taxonomically mislabeled sequences ('mislabels') using statistical models of evolution. We use the Evolutionary Placement Algorithm (EPA) to detect and score sequences whose taxonomic annotation is not supported by the underlying phylogenetic signal, and automatically propose a corrected taxonomic classification for those. Using simulated data, we show that our method attains high accuracy for identification (96.9% sensitivity/91.7% precision) as well as correction (94.9% sensitivity/89.9% precision) of mislabels. Furthermore, an analysis of four widely used microbial 16S reference databases (Greengenes, LTP, RDP and SILVA) indicates that they currently contain between 0.2% and 2.5% mislabels. Finally, we use SATIVA to perform an in-depth evaluation of alternative taxonomies for Cyanobacteria.

Details

einblenden:
ausblenden:
Sprache(n): eng - English
 Datum: 2016-06-20
 Publikationsstatus: Erschienen
 Seiten: 12
 Ort, Verlag, Ausgabe: -
 Inhaltsverzeichnis: -
 Art der Begutachtung: Interne Begutachtung
 Identifikatoren: eDoc: 732735
ISI: 000379753100015
 Art des Abschluß: -

Veranstaltung

einblenden:

Entscheidung

einblenden:

Projektinformation

einblenden:

Quelle 1

einblenden:
ausblenden:
Titel: Nucleic Acids Research (London)
  Andere : Nucleic Acids Res
Genre der Quelle: Zeitschrift
 Urheber:
Affiliations:
Ort, Verlag, Ausgabe: Oxford : Oxford University Press
Seiten: - Band / Heft: 44 (11) Artikelnummer: 11 Start- / Endseite: 5022 - 5033 Identifikator: ISSN: 0305-1048
CoNE: https://pure.mpg.de/cone/journals/resource/110992357379342