Help Privacy Policy Disclaimer
  Advanced SearchBrowse





Evaluation of Population-Based Haplotype Phasing Algorithms


Sethi,  Riccha
International Max Planck Research School, MPI for Informatics, Max Planck Society;
Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available

Sethi, R. (2016). Evaluation of Population-Based Haplotype Phasing Algorithms. Master Thesis, Universität des Saarlandes, Saarbrücken.

Cite as: https://hdl.handle.net/11858/00-001M-0000-002C-41DA-7
The valuable information in correct order of alleles on the haplotypes has many applications in GWAS studies and population genetics. A considerable number of computational and statistical algorithms have been developed for haplotype phasing. Historically, these algorithms were compared using the simulated population data with less dense markers which was inspired by genotype data from the HapMap project. Currently due to the advancement and reduction in cost of NGS, thousands of individuals across the world have been sequenced in 1000 Genomes Project. This has generated the genotype information of individuals from different ethnicity along with much denser genetic variations in them. Here, we have developed a scalable approach to assess state-of-the-art population-based haplotype phasing algorithms with benchmark data designed by simulation of the population (unrelated and related individuals), NGS pipeline and genotype calling. The most accurate algorithm was MVNCall (v1) for phase inference in unrelated individuals while DuoHMM approach of Shapeit (v2) had lowest switch error rate of 0.298 %(with true genotype likelihoods) in the related individuals. Moreover, we also conducted a comprehensive assessment of algorithms for the imputation of missing genotypes in the population with a reference panel. For this metrics, Impute2 (v2.3.2) and Beagle (v4.1) both performed competitively under different imputation scenarios and had genotype concordance rate of >99%. However, Impute2 was better in imputation of genotypes with minor allele frequency of <0.025 in the reference panel.