English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Robust Normalization of Next Generation Sequencing Data

Helmuth, J. (2017). Robust Normalization of Next Generation Sequencing Data. PhD Thesis. doi:10.17169/refubium-6942.

Item is

Files

show Files

Locators

show

Creators

show
hide
 Creators:
Helmuth, Johannes1, 2, Author                 
Vingron, Martin3, Referee                 
Affiliations:
1Computational Epigenetics (Ho-Ryun Chung), Independent Junior Research Groups (OWL), Max Planck Institute for Molecular Genetics, Max Planck Society, ou_1479658              
2Fachbereich Mathematik und Informatik der Freien Universität Berlin, ou_persistent22              
3Transcriptional Regulation (Martin Vingron), Dept. of Computational Molecular Biology (Head: Martin Vingron), Max Planck Institute for Molecular Genetics, Max Planck Society, ou_1479639              

Content

show
hide
Free keywords: Normalization of read count data; Enrichment Calling; Difference Calling; ChIP-seq; RNA-seq; ATAC-seq; STARR-seq
 Abstract: Molecular Biology pertains to the molecular basis of the regulation of biomolecular processes in the cell, e.g. gene expression or the genome-wide localization of DNA-associated proteins. These molecular quantities are routinely measured by Next Generation Sequencing (NGS)-based tech- niques due to their genome-wide scalability and cost-efficiency. In order to discern background- regions from genomic loci that harbor a biological relevant signal, i.e. difference calling, the NGS measurements need to be corrected for technical biases with the help of a control, i.e. nor- malization. However, the normalization itself requires the knowledge of background regions and, consequently, difference calling and normalization are inseparable. Here, this problem is solved by the data-driven “normR” framework which models the inter- dependency of NGS mea- surements in background- and signal-regions as a multinomial sampling trial with a binomial mixture model. The robust normR normalization accounts for the effect of signal on the overall measurement statistic by modeling treatment and control simultaneously. In this thesis, I used normR in three studies concerning the inference of DNA-protein binding from ChIP-seq data. Firstly, the two-component “enrichR” model is shown to achieve a more sensitive enrichment calling (AUC≥0.93) than six competitor methods (AUC≤0.86) in low, e.g. H3K36me3, and high, e.g. H3K4me3, signal-to- noise ratio (S/N) ChIP-seq data. enrichR’s enrichment calls augment the resolution and comprehensiveness of chromatin segmentations by chromHMM and its normal- ization improves on present in silico and in vitro ChIP-seq normalization methods. Secondly, the three-component “regimeR” model dissects enrichment into two unprecedented regimes of dif- ferent signal levels. A regimeR-based analysis identified two distinct facultative and constitutive heterochromatic enrichment regimes in H3K27me3 and H3K9me3 ChIP-seq data, respectively. The identified peak regions (high enrichment) resemble nucleation sites for heterochromatin embedded in regions of broad (low) enrichment. Lastly, the three-component “diffR” model calls conditional differences in ChIP-seq enrichment between two conditions. The diffR calls in low (H3K27me3) and high (H3K4me3) S/N ChIP-seq data are confirmed by a systematic compari- son to four difference callers. Overall, normR represents a robust and versatile framework for the comprehensive analysis of ChIP-seq data, yet, it can be readily applied to other NGS-based experiments like ATAC- seq, STARR-seq or RNA-seq.

Details

show
hide
Language(s): eng - English
 Dates: 20172017-06-28
 Publication Status: Published online
 Pages: xii, 145 S.
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Degree: PhD

Event

show

Legal Case

show

Project information

show

Source

show