English
 
User Manual Privacy Policy Disclaimer Contact us
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  A Fast, Consistent Kernel Two-Sample Test

Gretton, A., Fukumizu, K., Harchaoui, Z., & Sriperumbudur, B. (2010). A Fast, Consistent Kernel Two-Sample Test. In Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, & A. Culotta (Eds.), Advances in Neural Information Processing Systems 22 (pp. 673-681). Red Hook, NY, USA: Curran.

Item is

Basic

show hide
Item Permalink: http://hdl.handle.net/11858/00-001M-0000-0013-C0BC-3 Version Permalink: http://hdl.handle.net/21.11116/0000-0002-93CA-4
Genre: Conference Paper

Files

show Files

Creators

show
hide
 Creators:
Gretton, A1, 2, Author              
Fukumizu, K, Author              
Harchaoui, Z, Author              
Sriperumbudur, BK, Author              
Affiliations:
1Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_1497795              
2Max Planck Institute for Biological Cybernetics, Max Planck Society, Spemannstrasse 38, 72076 Tübingen, DE, ou_1497794              

Content

show
hide
Free keywords: -
 Abstract: A kernel embedding of probability distributions into reproducing kernel Hilbert spaces (RKHS) has recently been proposed, which allows the comparison of two probability measures P and Q based on the distance between their respective embeddings: for a sufficiently rich RKHS, this distance is zero if and only if P and Q coincide. In using this distance as a statistic for a test of whether two samples are from different distributions, a major difficulty arises in computing the significance threshold, since the empirical statistic has as its null distribution (where P = Q) an infinite weighted sum of x2 random variables. Prior finite sample approximations to the null distribution include using bootstrap resampling, which yields a consistent estimate but is computationally costly; and fitting a parametric model with the low order moments of the test statistic, which can work well in practice but has no consistency or accuracy guarantees. The main result of the present work is a novel estimate of the null distribution, computed from the eigenspectrum of the Gram matrix on the aggregate sample from P and Q, and having lower computational cost than the bootstrap. A proof of consistency of this estimate is provided. The performance of the null distribution estimate is compared with the bootstrap and parametric approaches on an artificial example, high dimensional multivariate data, and text.

Details

show
hide
Language(s):
 Dates: 2010-04
 Publication Status: Published in print
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: BibTex Citekey: 6132
 Degree: -

Event

show
hide
Title: 23rd Annual Conference on Neural Information Processing Systems (NIPS 2009)
Place of Event: Vancouver, BC, Canada
Start-/End Date: 2009-12-07 - 2009-12-10

Legal Case

show

Project information

show

Source 1

show
hide
Title: Advances in Neural Information Processing Systems 22
Source Genre: Proceedings
 Creator(s):
Bengio, Y, Editor
Schuurmans, D, Editor
Lafferty, J, Editor
Williams, C, Editor
Culotta, A, Editor
Affiliations:
-
Publ. Info: Red Hook, NY, USA : Curran
Pages: - Volume / Issue: - Sequence Number: - Start / End Page: 673 - 681 Identifier: ISBN: 978-1-615-67911-9