Phylogeny-aware identification and correction of taxonomically mislabeled 
sequences

Kozlov, A.; Zhang, J.; Yilmaz, P.; Glockner, F.; Stamatakis, A.

Phylogeny-aware identification and correction of taxonomically mislabeled sequences

Kozlov, A., Zhang, J., Yilmaz, P., Glockner, F., & Stamatakis, A. (2016). Phylogeny-aware identification and correction of taxonomically mislabeled sequences. Nucleic Acids Research (London), 44(11):, pp. 5022-5033.

Item is 公開

表示: 全項目非表示: 全項目

基本情報

表示: 非表示:

アイテムのパーマリンク: https://hdl.handle.net/21.11116/0000-0001-C2C5-5 版のパーマリンク: https://hdl.handle.net/21.11116/0000-0005-5506-5

資料種別: 学術論文

ファイル

表示: ファイル

非表示: ファイル

:

Yilmaz_2016.pdf (出版社版), 4MB

表示保存

ファイルのパーマリンク:
https://hdl.handle.net/21.11116/0000-0005-5507-4

ファイル名:
Yilmaz_2016.pdf

説明:
-

OA-Status:

閲覧制限:
公開

MIMEタイプ / チェックサム:
application/pdf / [MD5]

技術的なメタデータ:

表示

著作権日付:
-

著作権情報:
-

CCライセンス:
-

作成者

表示:

非表示:

作成者:
Kozlov, A., 著者
Zhang, J., 著者
Yilmaz, P.¹, 著者
Glockner, F.¹, 著者
Stamatakis, A., 著者

所属:
1Microbial Genomics Group, Department of Molecular Ecology, Max Planck Institute for Marine Microbiology, Max Planck Society, ou_2481697

内容説明

表示:

非表示:

キーワード: -

要旨: Molecular sequences in public databases are mostly annotated by the submitting authors without further validation. This procedure can generate erroneous taxonomic sequence labels. Mislabeled sequences are hard to identify, and they can induce downstream errors because new sequences are typically annotated using existing ones. Furthermore, taxonomic mislabelings in reference sequence databases can bias metagenetic studies which rely on the taxonomy. Despite significant efforts to improve the quality of taxonomic annotations, the curation rate is low because of the labor-intensive manual curation process. Here, we present SATIVA, a phylogeny-aware method to automatically identify taxonomically mislabeled sequences ('mislabels') using statistical models of evolution. We use the Evolutionary Placement Algorithm (EPA) to detect and score sequences whose taxonomic annotation is not supported by the underlying phylogenetic signal, and automatically propose a corrected taxonomic classification for those. Using simulated data, we show that our method attains high accuracy for identification (96.9% sensitivity/91.7% precision) as well as correction (94.9% sensitivity/89.9% precision) of mislabels. Furthermore, an analysis of four widely used microbial 16S reference databases (Greengenes, LTP, RDP and SILVA) indicates that they currently contain between 0.2% and 2.5% mislabels. Finally, we use SATIVA to perform an in-depth evaluation of alternative taxonomies for Cyanobacteria.

資料詳細

表示:

非表示:

言語: eng - English

日付: 出版: 2016-06-20

出版の状態: 出版

ページ: 12

出版情報: -

目次: -

査読: 査読あり（内部）

識別子（DOI, ISBNなど）: eDoc: 732735
ISI: 000379753100015

学位: -

訴訟

表示:

Project information

表示:

出版物 1

表示:

非表示:

出版物名: Nucleic Acids Research (London)

その他 : Nucleic Acids Res

種別: 学術雑誌

著者・編者:

所属:

出版社, 出版地: Oxford : Oxford University Press

ページ: - 巻号: 44 (11) 通巻号: 11 開始・終了ページ: 5022 - 5033 識別子（ISBN, ISSN, DOIなど）: ISSN: 0305-1048
CoNE: https://pure.mpg.de/cone/journals/resource/110992357379342

アイテム詳細

基本情報

ファイル

関連URL

作成者

内容説明

資料詳細

関連イベント

訴訟

Project information

出版物 1