hide
Free keywords:
Haplotype reconstruction; polyploid genomes; genome assembly
Abstract:
In this thesis, we focus on the problem of reconstructing haplotypes for polyploid genomes and the utilization of called haplotypes in de novo assembly of these genomes. We approach this topic exploring short read sequence data of the highly heterozygous hexaploid sweet potato genome. First, we investigate the role of heterozygosity and ploidy number in reconstructing haplotypes with short reads. In short, higher heterozygosity provides higher number of useful reads for reconstructing haplotypes while being polyploid introduces a challenge in assembling reads into longer sequences; we called it the problem of Ambiguity of Merging fragments. However, we address this problem and show that reads can be assembled into haplotypes with high accuracy using short reads. To this end, we propose a new algorithm, called Ranbow, and evaluate its performance on real and simulated datasets from tetraploid Capsella bursa-pastoris (Shepherd's Purse), and hexaploid Ipomoea batatas (sweet potato) genomes. We are able to show that our method achieves higher accuracy and longer assembled haplotypes than the other methods. Next, we present the de novo assembly pipeline of the sweet potato genome utilizing computed haplotypes for genome assembly improvement. This novel approach, called haplo-scaffolders, uses the assembled haplotypes in order to rescue a set of potential connections which were hidden due to the differences of true haplotypes and the reference sequence. These connections are obtained by mapping the reads into haplotypes and transforming the connection information to the reference level. This process can be repeated by updating the scaffold set to further improve the genome assembly. We show that this strategy improves substantially the N50 and maximum scaffold length of assembled sweet potato genome.