hide
Free keywords:
Computer Science, Information Theory, cs.IT,Mathematics, Information Theory, math.IT
Abstract:
Estimating mutual information (MI) between two continuous random variables
$X$ and $Y$ allows to capture non-linear dependencies between them,
non-parametrically. As such, MI estimation lies at the core of many data
science applications. Yet, robustly estimating MI for high-dimensional $X$ and
$Y$ is still an open research question.
In this paper, we formulate this problem through the lens of manifold
learning. That is, we leverage the common assumption that the information of
$X$ and $Y$ is captured by a low-dimensional manifold embedded in the observed
high-dimensional space and transfer it to MI estimation. As an extension to
state-of-the-art $k$NN estimators, we propose to determine the $k$-nearest
neighbours via geodesic distances on this manifold rather than form the ambient
space, which allows us to estimate MI even in the high-dimensional setting. An
empirical evaluation of our method, G-KSG, against the state-of-the-art shows
that it yields good estimations of the MI in classical benchmark, and manifold
tasks, even for high dimensional datasets, which none of the existing methods
can provide.