ausblenden:
Schlagwörter:
-
Zusammenfassung:
A dependence statistic, the Brownian Distance Covariance,
has been proposed for use in dependence measurement and independence testing:
we refer to this contribution henceforth as SR [we also note the earlier work on
this topic of Székely, Rizzo and Bakirov (2007)]. Some advantages of the authors’
approach are that the random variables X and Y being tested may have arbitrary di-
mension Rp and Rq , respectively; and the test is consistent against all alternatives
subject to the conditions E‖X‖p < ∞ and E‖X‖q < ∞.
In our discussion we review and compare against a number of related depen-
dence measures that have appeared in the statistics and machine learning litera-
ture. We begin with distances of the form of SR, equation (2.2), most notably the
work of Feuerverger (1993); Kankainen (1995); Kankainen and Ushakov (1998);
Ushakov (1999), which we describe in Section 2: these measures have been for-
mulated only for the case p = q = 1, however. In Section 3 we turn to more recent
dependence measures which are computed between mappings of the probability
distributions Px , Py , and Pxy of X, Y , and (X, Y ), respectively, to high dimen-
sional feature spaces: specifically, reproducing kernel Hilbert spaces (RKHSs).
The RKHS dependence statistics may be based on the distance [Smola et al.
(2007), Section 2.3], covariance [Gretton et al. (2005a, 2005b, 2008)], or corre-
lation [Dauxois and Nkiet (1998); Bach and Jordan (2002); Fukumizu, Bach and
Gretton (2007); Fukumizu et al. (2008)] between the feature mappings, and make
smoothness assumptions which can improve the power of the tests over approaches
relying on distances between the unmapped variables. When the RKHSs are char-
acteristic [Fukumizu et al. (2008); Sriperumbudur et al. (2008)], meaning that the
feature mapping from the space of probability measures to the RKHS is injective,
the kernel-based tests are consistent for all probability measures generating (X, Y ). RKHS-based tests apply on spaces Rp × Rq for arbitrary p and q. In fact,
kernel independence tests are applicable on a still broader range of (possibly
non-Euclidean) domains, which can include strings [Leslie et al. (2002)], graphs
[Gärtner, Flach and Wrobel (2003)], and groups [Fukumizu et al. (2009)], making
the kernel approach very general. In Section 4 we provide an empirical comparison
between the approach of SR and the kernel statistic of Gretton et al. (2005b, 2008)
on an independence testing benchmark.