hide
Free keywords:
Bioinformatics; Algorithms; Sequence Analysis; Alignment; Variant Detection
Abstract:
Next-Generation-Sequencing (NGS) has brought on a revolution in sequence analysis with its
broad spectrum of applications ranging from genome resequencing to transcriptomics or metage-
nomics, and from fundamental research to diagnostics. The tremendous amounts of data necessi-
tate highly ecient computational analysis tools for the wide variety of NGS applications.
This thesis addresses a broad range of key computational aspects of resequencing applications,
where a reference genome sequence is known and heavily used for interpretation of the newly
sequenced sample. It presents tools for read mapping and benchmarking, for partial read mapping
of small RNA reads and for structural variant/indel detection, and nally tools for detecting and
genotyping SNVs and short indels. Our tools eciently scale to large NGS data sets and are well-
suited for advances in sequencing technology, since their generic algorithm design allows handling
of arbitrary read lengths and variable error rates. Furthermore, they are implemented within the
robust C++ library SeqAn, making them open-source, easily available, and potentially adaptable
for the bioinformatics community. Among other applications, our tools have been integrated into
a large-scale analysis pipeline and have been applied to large datasets, leading to interesting
discoveries of human retrocopy variants and insights into the genetic causes of X-linked intellectual
disabilities.