English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences

Seiler, E., Mehringer, S., Mitra, D., Turc, E., & Reinert, K. (2021). Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences. iScience, 24(7): 102782. doi:10.1016/j.isci.2021.102782.

Item is

Files

show Files
hide Files
:
iScience_Seiler et al_2021.pdf (Publisher version), 2MB
Name:
iScience_Seiler et al_2021.pdf
Description:
-
OA-Status:
Not specified
Visibility:
Public
MIME-Type / Checksum:
application/pdf / [MD5]
Technical Metadata:
Copyright Date:
-
Copyright Info:
© 2021 The Author(s)

Locators

show

Creators

show
hide
 Creators:
Seiler, Enrico1, Author                 
Mehringer, Svenja1, Author                 
Mitra, Darvish1, Author           
Turc, Etienne , Author
Reinert, Knut2, Author                 
Affiliations:
1IMPRS for Biology and Computation (Anne-Dominique Gindrat), Dept. of Computational Molecular Biology (Head: Martin Vingron), Max Planck Institute for Molecular Genetics, Max Planck Society, ou_1479666              
2Efficient Algorithms for Omics Data (Knut Reinert), Max Planck Fellow Group, Max Planck Institute for Molecular Genetics, Max Planck Society, ou_2385698              

Content

show
hide
Free keywords: bioinformatics; genetics; high-performance computing in bioinformatics
 Abstract: We present Raptor, a system for approximately searching many queries such as next-generation sequencing reads or transcripts in large collections of nucleotide sequences. Raptor uses winnowing minimizers to define a set of representative k-mers, an extension of the interleaved Bloom filters (IBFs) as a set membership data structure and probabilistic thresholding for minimizers. Our approach allows compression and partitioning of the IBF to enable the effective use of secondary memory. We test and show the performance and limitations of the new features using simulated and real datasets. Our data structure can be used to accelerate various core bioinformatics applications. We show this by re-implementing the distributed read mapping tool DREAM-Yara.

Details

show
hide
Language(s): eng - English
 Dates: 2021-06-24
 Publication Status: Published online
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: DOI: 10.1016/j.isci.2021.102782
PMID: 34337360
PMC: PMC8313605
 Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show
hide
Title: iScience
Source Genre: Journal
 Creator(s):
Affiliations:
Publ. Info: Amsterdam ; Bosten ; London ; New York ; Oxford ; Paris ; Philadelphia ; San Diego ; St. Louis : Elsevier
Pages: - Volume / Issue: 24 (7) Sequence Number: 102782 Start / End Page: - Identifier: ISSN: 2589-0042
CoNE: https://pure.mpg.de/cone/journals/resource/2589-0042