Help Privacy Policy Disclaimer
  Advanced SearchBrowse


  Interpretable Machine Learning Methods for Prediction and Analysis of Genome Regulation in 3D

Nikumbh, S. (2019). Interpretable Machine Learning Methods for Prediction and Analysis of Genome Regulation in 3D. PhD Thesis, Universität des Saarlandes, Saarbrücken. doi:10.22028/D291-28153.

Item is


show Files




Nikumbh, Sarvesh1, 2, Author              
Pfeifer, Nico1, Advisor              
Marschall, Tobias1, Referee              
Ebert, Peter1, Referee              
1Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society, ou_40046              
2International Max Planck Research School, MPI for Informatics, Max Planck Society, Campus E1 4, 66123 Saarbrücken, DE, ou_1116551              


Free keywords: -
 Abstract: With the development of chromosome conformation capture-based techniques, we now know that chromatin is packed in three-dimensional (3D) space inside the cell nucleus. Changes in the 3D chromatin architecture have already been implicated in diseases such as cancer. Thus, a better understanding of this 3D conformation is of interest to help enhance our comprehension of the complex, multipronged regulatory mechanisms of the genome. The work described in this dissertation largely focuses on development and application of interpretable machine learning methods for prediction and analysis of long-range genomic interactions output from chromatin interaction experiments. In the first part, we demonstrate that the genetic sequence information at the ge- nomic loci is predictive of the long-range interactions of a particular locus of interest (LoI). For example, the genetic sequence information at and around enhancers can help predict whether it interacts with a promoter region of interest. This is achieved by building string kernel-based support vector classifiers together with two novel, in- tuitive visualization methods. These models suggest a potential general role of short tandem repeat motifs in the 3D genome organization. But, the insights gained out of these models are still coarse-grained. To this end, we devised a machine learning method, called CoMIK for Conformal Multi-Instance Kernels, capable of providing more fine-grained insights. When comparing sequences of variable length in the su- pervised learning setting, CoMIK can not only identify the features important for classification but also locate them within the sequence. Such precise identification of important segments of the whole sequence can help in gaining de novo insights into any role played by the intervening chromatin towards long-range interactions. Although CoMIK primarily uses only genetic sequence information, it can also si- multaneously utilize other information modalities such as the numerous functional genomics data if available. The second part describes our pipeline, pHDee, for easy manipulation of large amounts of 3D genomics data. We used the pipeline for analyzing HiChIP experimen- tal data for studying the 3D architectural changes in Ewing sarcoma (EWS) which is a rare cancer affecting adolescents. In particular, HiChIP data for two experimen- tal conditions, doxycycline-treated and untreated, and for primary tumor samples is analyzed. We demonstrate that pHDee facilitates processing and easy integration of large amounts of 3D genomics data analysis together with other data-intensive bioinformatics analyses.


Language(s): eng - English
 Dates: 2019-06-242019-07-052019
 Publication Status: Published in print
 Pages: 150 p.
 Publishing info: Saarbrücken : Universität des Saarlandes
 Table of Contents: -
 Rev. Type: -
 Identifiers: BibTex Citekey: Nikumbhphd2019
DOI: 10.22028/D291-28153
 Degree: PhD



Legal Case


Project information