DREAM-Yara: an exact read mapper for very large databases with short update time

Dadi, Temesgen Hailemariam; Siragusa, Enrico; Piro, Vitor C.; Andrusch, Andreas; Seiler, Enrico; Renard, Bernhard Y.; Reinert, Knut

doi:10.1093/bioinformatics/bty567

Local TagsRelease HistoryDetailsSummary

DREAM-Yara: an exact read mapper for very large databases with short update time

Dadi, T. H., Siragusa, E., Piro, V. C., Andrusch, A., Seiler, E., Renard, B. Y., et al. (2018). DREAM-Yara: an exact read mapper for very large databases with short update time. Bioinformatics, 34(17), i766-1772. doi:10.1093/bioinformatics/bty567.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-000E-5AB1-5 Version Permalink: https://hdl.handle.net/21.11116/0000-000E-5AB2-4

Genre: Journal Article

Files

show Files

hide Files

:

Bioinformatics_Dadi et al_2018.pdf (Publisher version), 445KB

View Save

File Permalink:
https://hdl.handle.net/21.11116/0000-000E-5AB3-3

Name:
Bioinformatics_Dadi et al_2018.pdf

Description:
-

OA-Status:
Not specified

Visibility:
Public

MIME-Type / Checksum:
application/pdf / [MD5]

Technical Metadata:

View

Copyright Date:
-

Copyright Info:
© The Author(s) 2018

License:
-

Locators

show

Creators

show

hide

Creators:
Dadi, Temesgen Hailemariam¹, Author
Siragusa, Enrico , Author
Piro, Vitor C. , Author
Andrusch, Andreas , Author
Seiler, Enrico¹, Author
Renard, Bernhard Y. , Author
Reinert, Knut², Author

Affiliations:
1IMPRS for Biology and Computation (Anne-Dominique Gindrat), Dept. of Computational Molecular Biology (Head: Martin Vingron), Max Planck Institute for Molecular Genetics, Max Planck Society, ou_1479666
2Efficient Algorithms for Omics Data (Knut Reinert), Max Planck Fellow Group, Max Planck Institute for Molecular Genetics, Max Planck Society, ou_2385698

Content

show

hide

Free keywords: -

Abstract: Motivation: Mapping-based approaches have become limited in their application to very large sets of references since computing an FM-index for very large databases (e.g. >10 GB) has become a bottleneck. This affects many analyses that need such index as an essential step for approximate matching of the NGS reads to reference databases. For instance, in typical metagenomics analysis, the size of the reference sequences has become prohibitive to compute a single full-text index on standard machines. Even on large memory machines, computing such index takes about 1 day of computing time. As a result, updates of indices are rarely performed. Hence, it is desirable to create an alternative way of indexing while preserving fast search times.

Results: To solve the index construction and update problem we propose the DREAM (Dynamic seaRchablE pArallel coMpressed index) framework and provide an implementation. The main contributions are the introduction of an approximate search distributor via a novel use of Bloom filters. We combine several Bloom filters to form an interleaved Bloom filter and use this new data structure to quickly exclude reads for parts of the databases where they cannot match. This allows us to keep the databases in several indices which can be easily rebuilt if parts are updated while maintaining a fast search time. The second main contribution is an implementation of DREAM-Yara a distributed version of a fully sensitive read mapper under the DREAM framework.

Details

show

hide

Language(s): eng - English

Dates: Date issued: 2018-09-08

Publication Status: Issued

Pages: -

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: DOI: 10.1093/bioinformatics/bty567
PMID: 30423080

Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show

hide

Title: Bioinformatics

Source Genre: Journal

Creator(s):

Affiliations:

Publ. Info: Oxford : Oxford University Press

Pages: - Volume / Issue: 34 (17) Sequence Number: - Start / End Page: i766 - 1772 Identifier: ISSN: 1367-4803
CoNE: https://pure.mpg.de/cone/journals/resource/954926969991