hide
Free keywords:
association measure; prior knowledge; gene regulatory network; distance correlation; Bayesian network
Abstract:
Reconstructing gene regulatory networks (GRNs) from expression data is a
challenging task that has become essential to the understanding of
complex regulatory mechanisms in cells. The major issues are the usually
very high ratio of the number of genes to the sample size, and the noise
in the available data. In this thesis we investigate the effect of the
number of samples and noise on the performance of statistical methods.
The results indicate that in the case of not having many samples and/or
in facing high amount of noise like the case for gene expression data,
the performance of all methods decreased significantly compared to the
well behaved case (many samples and no noise).
Integrating biological prior knowledge to the learning process is a
natural and promising way to partially compensate for the lack of
reliable expression data and to increase the accuracy of network
reconstruction algorithms. In this thesis, we present PriorPC, a new
algorithm based on the PC algorithm that uses prior knowledge. Despite
being one of the most popular methods for Bayesian network
reconstruction, the PC algorithm is known to depend strongly on the
order in which nodes are presented, especially for large networks.
PriorPC exploits this flaw to include prior knowledge. We show on both
synthetic and real data that the structural accuracy of networks
obtained with PriorPC is greatly improved compared to the PC algorithm.
Furthermore, PriorPC is fast and scales well for large networks which is
important for its applicability to experimental data.
Another challenge in GRN reconstruction is to detect (direct) nonlinear
interactions between genes. A recently proposed association measure
named distance correlation is a powerful method to find nonlinear
relationships. In this thesis, we propose a novel approach to estimate
partial distance correlation, the generalization of distance correlation
which accounts for the influence of other variables and therefore it can
detect direct nonlinear relationships.