Inferring Transcriptional Regulators Using Clustered Multi-Task Regression

Heinen, Tobias

Local TagsRelease HistoryDetailsSummary

Inferring Transcriptional Regulators Using Clustered Multi-Task Regression

Heinen, T. (2018). Inferring Transcriptional Regulators Using Clustered Multi-Task Regression. Master Thesis, Universität des Saarlandes, Saarbrücken.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-0002-B37A-B Version Permalink: https://hdl.handle.net/21.11116/0000-0002-B37E-7

Genre: Thesis

Files

show Files

hide Files

:

2018_Tobais Heinen_MSc Thesis.pdf (Any fulltext), 5MB

File Permalink:
-

Name:
2018_Tobais Heinen_MSc Thesis.pdf

Description:
-

OA-Status:

Visibility:
Restricted (Max Planck Institute for Informatics, MSIN; )

MIME-Type / Checksum:
application/pdf

Technical Metadata:

Copyright Date:
-

Copyright Info:
-

License:
-

Locators

show

Creators

show

hide

Creators:
Heinen, Tobias¹, Author
Schulz, Marcel Holger², Advisor
Marschall, Tobias², Referee

Affiliations:
1International Max Planck Research School, MPI for Informatics, Max Planck Society, ou_1116551
2Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society, ou_40046

Content

show

hide

Free keywords: -

Abstract: Sparse linear regression is often used to identify key transcriptional regulators by
predicting gene expression abundance from regulatory features such as transcription
factor (TF) binding or epigenomics data. However, a single linear model explaining
the gene expression of thousands of genes is limited in capturing the complexity of
cis-regulatory modules and gene co-expression patterns. Indeed, certain TFs are
known to act as both activators or repressors depending on associated cofactors and
neighbouring DNA-bound proteins. It is therefore desirable to identify clusters or
modules of co-regulated genes and model their regulatory proﬁles separately.
Finite mixtures of regression models are a popular tool for modeling hetero-
geneous data, while maintaining a linearity assumption. Unfortunately, they do
not take advantage of available data sets containing the molecular proﬁles of many
biological samples. We propose to combine the power of mixture modeling and
multi-task learning by using a penalized maximum likelihood framework for infer-
ring gene modules and regulators in multiple samples simultaneously. More specif-
ically, we regularize the likelihood function with a tree-structured L1/L2 penalty
to enable knowledge transfer between models of related cells. We optimize the
parameters of our models with a generalized EM algorithm. Experimental evalu-
ation of our method on synthetic data suggests that multi-task mixture modelling
is more suitable for identifying the true underlying cluster structure compared to a
single-task regression mixture model. Finally, we apply the model to a dataset from
the BLUEPRINT project consisting of various types of haematopoietic cells and
uncover interesting regulatory patterns.

Details

show

hide

Language(s): eng - English

Dates: Accepted: 2018-05-23Date issued: 2018

Publication Status: Issued

Pages: 93 p.

Publishing info: Saarbrücken : Universität des Saarlandes

Table of Contents: -

Rev. Type: -

Identifiers: BibTex Citekey: HeinenMaster2018

Degree: Master

Event

show

Legal Case

show

Project information

show

Source

show