hide
Free keywords:
-
Abstract:
When predicting the functions of unannoted proteins based on a protein network, one relies on some notions of “closeness” or “distance” among the nodes. However, inferring closeness among the nodes is an extremely ill-posed problem, because the proximity information provided by the edges is only local. Moreover, it is preferable that the resulting similarity matrix be a valid kernel matrix so that function prediction can be done by support vector machines (SVMs) or other high-performance kernel classifiers [2]. Maximum entropy methods have been proven to be effective for solving general ill-posed problems. However, these methods are concerned with the estimation of a probability distribution, not a kernel matrix. In this work, we generalize the maximum entropy framework to estimate a positive definite kernel matrix.
We found that the diffusion kernel [1], which has been used successfully for making predictions from biological networks (e.g. [3]), can be derived from this framework. However, one drawback inherent in the diffusion kernel is that, in the feature space, the distances between connected samples have high variance. As a result, some of the samples are outliers, which should be avoided for reliable statistical inference. Our new kernel based on local constraints resolves this problem and thereby shows better accuracy in yeast function prediction.