hide
Free keywords:
-
Abstract:
In biological data, it is often the case that observed data are available only for a subset of samples. When akernel matrix is derived from such data, we have to leave the entries for unavailable samples as missing. Inthis paper, the missing entries are completed by exploiting an auxiliary kernel matrix derived from anotherinformation source. The parametric model of kernel matrices is created as a set of spectral variants of theauxiliary kernel matrix, and the missing entries are estimated by fitting this model to the existing entries. Formodel fitting, we adopt theemalgorithm (distinguished from the EM algorithm of Dempster et al., 1977)based on the information geometry of positive definite matrices. We will report promising results on bacteriaclustering experiments using two marker sequences: 16S and gyrB.