Help Privacy Policy Disclaimer
  Advanced SearchBrowse


  Enhancing the detection of barcoded reads in high throughput DNA sequencing data by controlling the false discovery rate

Buschmann, T., Zhang, R., Brash, D. E., & Bystrykh, L. V. (2014). Enhancing the detection of barcoded reads in high throughput DNA sequencing data by controlling the false discovery rate. BMC Bioinformatics, 15: 264. doi:10.1186/1471-2105-15-264.

Item is


show Files
hide Files
Buschmann_Enhancing.pdf (Publisher version), 1016KB
MIME-Type / Checksum:
application/pdf / [MD5]
Technical Metadata:
Copyright Date:
Copyright Info:
© 2014 Buschmann et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.




Buschmann, Tilo1, 2, Author              
Zhang, Rong3, 4, Author
Brash, Douglas E.3, Author
Bystrykh, Leonid V.5, Author
1Max Planck Research Group Neuroanatomy and Connectivity, MPI for Human Cognitive and Brain Sciences, Max Planck Society, Leipzig, DE, ou_1356546              
2Department of Diagnostics, Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany, ou_persistent22              
3Department of Therapeutic Radiology, Yale School of Medicine, New Haven, CT, USA, ou_persistent22              
4Department of Toxicology, School of Public Health, Hebei Medical University, Shijiazhuang, China, ou_persistent22              
5Laboratory of Ageing Biology and Stem Cells, European Research Institute for the Biology of Ageing, University Medical Center Groningen, the Netherlands, ou_persistent22              


Free keywords: -
 Abstract: Background: DNA barcodes are short unique sequences used to label DNA or RNA-derived samples in multiplexed deep sequencing experiments. During the demultiplexing step, barcodes must be detected and their position identified. In some cases (e.g., with PacBio SMRT), the position of the barcode and DNA context is not well defined. Many reads start inside the genomic insert so that adjacent primers might be missed. The matter is further complicated by coincidental similarities between barcode sequences and reference DNA. Therefore, a robust strategy is required in order to detect barcoded reads and avoid a large number of false positives or negatives. For mass inference problems such as this one, false discovery rate (FDR) methods are powerful and balanced solutions. Since existing FDR methods cannot be applied to this particular problem, we present an adapted FDR method that is suitable for the detection of barcoded reads as well as suggest possible improvements. Results: In our analysis, barcode sequences showed high rates of coincidental similarities with the Mus musculus reference DNA. This problem became more acute when the length of the barcode sequence decreased and the number of barcodes in the set increased. The method presented in this paper controls the tail area-based false discovery rate to distinguish between barcoded and unbarcoded reads. This method helps to establish the highest acceptable minimal distance between reads and barcode sequences. In a proof of concept experiment we correctly detected barcodes in 83% of the reads with a precision of 89%. Sensitivity improved to 99% at 99% precision when the adjacent primer sequence was incorporated in the analysis. The analysis was further improved using a paired end strategy. Following an analysis of the data for sequence variants induced in the Atp1a1 gene of C57BL/6 murine melanocytes by ultraviolet light and conferring resistance to ouabain, we found no evidence of cross-contamination of DNA material between samples. Conclusion: Our method offers a proper quantitative treatment of the problem of detecting barcoded reads in a noisy sequencing environment. It is based on the false discovery rate statistics that allows a proper trade-off between sensitivity and precision to be chosen.


Language(s): eng - English
 Dates: 2013-12-132014-07-192014-08-07
 Publication Status: Published online
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: Peer
 Identifiers: DOI: 10.1186/1471-2105-15-264
PMID: 25099007
PMC: PMC4133078
 Degree: -



Legal Case


Project information


Source 1

Title: BMC Bioinformatics
Source Genre: Journal
Publ. Info: BioMed Central
Pages: - Volume / Issue: 15 Sequence Number: 264 Start / End Page: - Identifier: ISSN: 1471-2105
CoNE: https://pure.mpg.de/cone/journals/resource/111000136905000