Evaluating information content of SNPs for sample-tagging in re-sequencing 
projects

Hu, H.; Liu, X.; Jin, W.; Ropers, H. H.; Wienker, T. F.

doi:10.1038/srep10247

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Lokale TagsFreigabegeschichteDetailsÜbersicht

Freigegeben

Zeitschriftenartikel

Evaluating information content of SNPs for sample-tagging in re-sequencing projects

MPG-Autoren

Hu, H.
Emeritus Group of Human Molecular Genetics (Head: Hans-Hilger Ropers), Max Planck Institute for Molecular Genetics, Max Planck Society;

/persons/resource/persons50501

Ropers, H. H.
Emeritus Group of Human Molecular Genetics (Head: Hans-Hilger Ropers), Max Planck Institute for Molecular Genetics, Max Planck Society;

/persons/resource/persons129894

Wienker, T. F.
Clinical Genetics (Thomas F. Wienker), Emeritus Group of Human Molecular Genetics (Head: Hans-Hilger Ropers), Max Planck Institute for Molecular Genetics, Max Planck Society;

Externe Ressourcen

http://www.ncbi.nlm.nih.gov/pubmed/25975447
(beliebiger Volltext)

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

Hu.pdf
(Verlagsversion), 871KB

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Hu, H., Liu, X., Jin, W., Ropers, H. H., & Wienker, T. F. (2015). Evaluating information content of SNPs for sample-tagging in re-sequencing projects. Scientific Reports, 5: 5:10247. doi:10.1038/srep10247.

Zitierlink: https://hdl.handle.net/21.11116/0000-0000-C68E-1

Zusammenfassung

Sample-tagging is designed for identification of accidental sample mix-up, which is a major issue in re-sequencing studies. In this work, we develop a model to measure the information content of SNPs, so that we can optimize a panel of SNPs that approach the maximal information for discrimination. The analysis shows that as low as 60 optimized SNPs can differentiate the individuals in a population as large as the present world, and only 30 optimized SNPs are in practice sufficient in labeling up to 100 thousand individuals. In the simulated populations of 100 thousand individuals, the average Hamming distances, generated by the optimized set of 30 SNPs are larger than 18, and the duality frequency, is lower than 1 in 10 thousand. This strategy of sample discrimination is proved robust in large sample size and different datasets. The optimized sets of SNPs are designed for Whole Exome Sequencing, and a program is provided for SNP selection, allowing for customized SNP numbers and interested genes. The sample-tagging plan based on this framework will improve re-sequencing projects in terms of reliability and cost-effectiveness.