Raptor: A fast and space-efficient pre-filter for querying very large 
collections of nucleotide sequences

Seiler, Enrico; Mehringer, Svenja; Mitra, Darvish; Turc, Etienne; Reinert, Knut

doi:10.1016/j.isci.2021.102782

Local TagsRelease HistoryDetailsSummary

Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences

Seiler, E., Mehringer, S., Mitra, D., Turc, E., & Reinert, K. (2021). Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences. iScience, 24(7): 102782. doi:10.1016/j.isci.2021.102782.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-000E-59C2-3 Version Permalink: https://hdl.handle.net/21.11116/0000-000E-59C3-2

Genre: Journal Article

Files

show Files

hide Files

:

iScience_Seiler et al_2021.pdf (Publisher version), 2MB

View Save

File Permalink:
https://hdl.handle.net/21.11116/0000-000E-59C4-1

Name:
iScience_Seiler et al_2021.pdf

Description:
-

OA-Status:
Not specified

Visibility:
Public

MIME-Type / Checksum:
application/pdf / [MD5]

Technical Metadata:

View

Copyright Date:
-

Copyright Info:
© 2021 The Author(s)

License:
https://creativecommons.org/licenses/by/4.0/

Locators

show

Creators

show

hide

Creators:
Seiler, Enrico¹, Author
Mehringer, Svenja¹, Author
Mitra, Darvish¹, Author
Turc, Etienne , Author
Reinert, Knut², Author

Affiliations:
1IMPRS for Biology and Computation (Anne-Dominique Gindrat), Dept. of Computational Molecular Biology (Head: Martin Vingron), Max Planck Institute for Molecular Genetics, Max Planck Society, ou_1479666
2Efficient Algorithms for Omics Data (Knut Reinert), Max Planck Fellow Group, Max Planck Institute for Molecular Genetics, Max Planck Society, ou_2385698

Content

show

hide

Free keywords: bioinformatics; genetics; high-performance computing in bioinformatics

Abstract: We present Raptor, a system for approximately searching many queries such as next-generation sequencing reads or transcripts in large collections of nucleotide sequences. Raptor uses winnowing minimizers to define a set of representative k-mers, an extension of the interleaved Bloom filters (IBFs) as a set membership data structure and probabilistic thresholding for minimizers. Our approach allows compression and partitioning of the IBF to enable the effective use of secondary memory. We test and show the performance and limitations of the new features using simulated and real datasets. Our data structure can be used to accelerate various core bioinformatics applications. We show this by re-implementing the distributed read mapping tool DREAM-Yara.

Details

show

hide

Language(s): eng - English

Dates: Published Online: 2021-06-24

Publication Status: Published online

Pages: -

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: DOI: 10.1016/j.isci.2021.102782
PMID: 34337360
PMC: PMC8313605

Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show

hide

Title: iScience

Source Genre: Journal

Creator(s):

Affiliations:

Publ. Info: Amsterdam ; Bosten ; London ; New York ; Oxford ; Paris ; Philadelphia ; San Diego ; St. Louis : Elsevier

Pages: - Volume / Issue: 24 (7) Sequence Number: 102782 Start / End Page: - Identifier: ISSN: 2589-0042
CoNE: https://pure.mpg.de/cone/journals/resource/2589-0042