English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Journal Article

Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences

MPS-Authors
/persons/resource/persons295618

Seiler,  Enrico       
IMPRS for Biology and Computation (Anne-Dominique Gindrat), Dept. of Computational Molecular Biology (Head: Martin Vingron), Max Planck Institute for Molecular Genetics, Max Planck Society;

/persons/resource/persons295621

Mehringer,  Svenja       
IMPRS for Biology and Computation (Anne-Dominique Gindrat), Dept. of Computational Molecular Biology (Head: Martin Vingron), Max Planck Institute for Molecular Genetics, Max Planck Society;

/persons/resource/persons247266

Darvish,  Mitra       
IMPRS for Biology and Computation (Anne-Dominique Gindrat), Dept. of Computational Molecular Biology (Head: Martin Vingron), Max Planck Institute for Molecular Genetics, Max Planck Society;

/persons/resource/persons45277

Reinert,  Knut       
Efficient Algorithms for Omics Data (Knut Reinert), Max Planck Fellow Group, Max Planck Institute for Molecular Genetics, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)

iScience_Seiler et al_2021.pdf
(Publisher version), 2MB

Supplementary Material (public)
There is no public supplementary material available
Citation

Seiler, E., Mehringer, S., Darvish, M., Turc, E., & Reinert, K. (2021). Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences. iScience, 24(7): 102782. doi:10.1016/j.isci.2021.102782.


Cite as: https://hdl.handle.net/21.11116/0000-000E-59C2-3
Abstract
We present Raptor, a system for approximately searching many queries such as next-generation sequencing reads or transcripts in large collections of nucleotide sequences. Raptor uses winnowing minimizers to define a set of representative k-mers, an extension of the interleaved Bloom filters (IBFs) as a set membership data structure and probabilistic thresholding for minimizers. Our approach allows compression and partitioning of the IBF to enable the effective use of secondary memory. We test and show the performance and limitations of the new features using simulated and real datasets. Our data structure can be used to accelerate various core bioinformatics applications. We show this by re-implementing the distributed read mapping tool DREAM-Yara.