English
 
User Manual Privacy Policy Disclaimer Contact us
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Evaluating information content of SNPs for sample-tagging in re-sequencing projects

Hu, H., Liu, X., Jin, W., Ropers, H. H., & Wienker, T. F. (2015). Evaluating information content of SNPs for sample-tagging in re-sequencing projects. Scientific Reports, 5: 5:10247. doi:10.1038/srep10247.

Item is

Basic

show hide
Item Permalink: http://hdl.handle.net/21.11116/0000-0000-C68E-1 Version Permalink: http://hdl.handle.net/21.11116/0000-0000-C68F-0
Genre: Journal Article

Files

show Files
hide Files
:
Hu.pdf (Publisher version), 871KB
Name:
Hu.pdf
Description:
-
Visibility:
Public
MIME-Type / Checksum:
application/pdf / [MD5]
Technical Metadata:
Copyright Date:
-
Copyright Info:
© 2018 Macmillan Publishers Limited, part of Springer Nature

Locators

show
hide
Description:
-

Creators

show
hide
 Creators:
Hu, H.1, Author
Liu, X., Author
Jin, W., Author
Ropers, H. H.1, Author              
Wienker, T. F.2, Author              
Affiliations:
1Emeritus Group of Human Molecular Genetics (Head: Hans-Hilger Ropers), Max Planck Institute for Molecular Genetics, Max Planck Society, ou_2385695              
2Clinical Genetics (Thomas F. Wienker), Emeritus Group of Human Molecular Genetics (Head: Hans-Hilger Ropers), Max Planck Institute for Molecular Genetics, Max Planck Society, ou_2385696              

Content

show
hide
Free keywords: Base Sequence Chromosome Mapping/*methods Gene Frequency/*genetics Genetic Markers/genetics Genetic Variation Genetics, Population/*methods Genome, Human/*genetics Genotype High-Throughput Nucleotide Sequencing Humans Models, Theoretical Polymorphism, Single Nucleotide/*genetics Reproducibility of Results Sequence Analysis, DNA
 Abstract: Sample-tagging is designed for identification of accidental sample mix-up, which is a major issue in re-sequencing studies. In this work, we develop a model to measure the information content of SNPs, so that we can optimize a panel of SNPs that approach the maximal information for discrimination. The analysis shows that as low as 60 optimized SNPs can differentiate the individuals in a population as large as the present world, and only 30 optimized SNPs are in practice sufficient in labeling up to 100 thousand individuals. In the simulated populations of 100 thousand individuals, the average Hamming distances, generated by the optimized set of 30 SNPs are larger than 18, and the duality frequency, is lower than 1 in 10 thousand. This strategy of sample discrimination is proved robust in large sample size and different datasets. The optimized sets of SNPs are designed for Whole Exome Sequencing, and a program is provided for SNP selection, allowing for customized SNP numbers and interested genes. The sample-tagging plan based on this framework will improve re-sequencing projects in terms of reliability and cost-effectiveness.

Details

show
hide
Language(s): eng - English
 Dates: 2015-05-152015
 Publication Status: Published in print
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Method: -
 Identifiers: DOI: 10.1038/srep10247
ISSN: 2045-2322 (Electronic)
 Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show
hide
Title: Scientific Reports
  Abbreviation : Sci. Rep.
Source Genre: Journal
 Creator(s):
Affiliations:
Publ. Info: London, UK : Nature Publishing Group
Pages: - Volume / Issue: 5 Sequence Number: 5:10247 Start / End Page: - Identifier: ISSN: 2045-2322
CoNE: https://pure.mpg.de/cone/journals/resource/2045-2322