English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  ganon: precise metagenomics classification against large and up-to-date sets of reference sequences

Piro, V. C., Dadi, T. H., Seiler, E., Reinert, K., & Renard, B. Y. (2020). ganon: precise metagenomics classification against large and up-to-date sets of reference sequences. Bioinformatics, 36(suppl.1), i12-i20. doi:10.1093/bioinformatics/btaa458.

Item is

Files

show Files
hide Files
:
Bioinformatics_Piro et al_2020.pdf (Publisher version), 2MB
Name:
Bioinformatics_Piro et al_2020.pdf
Description:
-
OA-Status:
Not specified
Visibility:
Public
MIME-Type / Checksum:
application/pdf / [MD5]
Technical Metadata:
Copyright Date:
-
Copyright Info:
© The Author(s) 2020

Locators

show

Creators

show
hide
 Creators:
Piro, Vitor C. , Author
Dadi, Temesgen Hailemariam1, Author                 
Seiler, Enrico1, Author                 
Reinert, Knut2, Author                 
Renard, Bernhard Y. , Author
Affiliations:
1IMPRS for Biology and Computation (Anne-Dominique Gindrat), Dept. of Computational Molecular Biology (Head: Martin Vingron), Max Planck Institute for Molecular Genetics, Max Planck Society, ou_1479666              
2Efficient Algorithms for Omics Data (Knut Reinert), Max Planck Fellow Group, Max Planck Institute for Molecular Genetics, Max Planck Society, ou_2385698              

Content

show
hide
Free keywords: -
 Abstract: Motivation: The exponential growth of assembled genome sequences greatly benefits metagenomics studies. However, currently available methods struggle to manage the increasing amount of sequences and their frequent updates. Indexing the current RefSeq can take days and hundreds of GB of memory on large servers. Few methods address these issues thus far, and even though many can theoretically handle large amounts of references, time/memory requirements are prohibitive in practice. As a result, many studies that require sequence classification use often outdated and almost never truly up-to-date indices.

Results: Motivated by those limitations, we created ganon, a k-mer-based read classification tool that uses Interleaved Bloom Filters in conjunction with a taxonomic clustering and a k-mer counting/filtering scheme. Ganon provides an efficient method for indexing references, keeping them updated. It requires <55 min to index the complete RefSeq of bacteria, archaea, fungi and viruses. The tool can further keep these indices up-to-date in a fraction of the time necessary to create them. Ganon makes it possible to query against very large reference sets and therefore it classifies significantly more reads and identifies more species than similar methods. When classifying a high-complexity CAMI challenge dataset against complete genomes from RefSeq, ganon shows strongly increased precision with equal or better sensitivity compared with state-of-the-art tools. With the same dataset against the complete RefSeq, ganon improved the F1-score by 65% at the genus level. It supports taxonomy- and assembly-level classification, multiple indices and hierarchical classification.

Details

show
hide
Language(s): eng - English
 Dates: 2020-07-132020-07
 Publication Status: Issued
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: DOI: 10.1093/bioinformatics/btaa458
PMID: 32657362
PMC: PMC7355301
 Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show
hide
Title: Bioinformatics
Source Genre: Journal
 Creator(s):
Affiliations:
Publ. Info: Oxford : Oxford University Press
Pages: - Volume / Issue: 36 (suppl.1) Sequence Number: - Start / End Page: i12 - i20 Identifier: ISSN: 1367-4803
CoNE: https://pure.mpg.de/cone/journals/resource/954926969991