Deutsch
 
Hilfe Datenschutzhinweis Impressum
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT

Freigegeben

Konferenzbeitrag

Pruning nearest neighbor cluster trees

MPG-Autoren
/persons/resource/persons84024

Kpotufe,  S
Dept. Empirical Inference, Max Planck Institute for Intelligent Systems, Max Planck Society;

/persons/resource/persons76237

von Luxburg,  U
Dept. Empirical Inference, Max Planck Institute for Intelligent Systems, Max Planck Society;

Externe Ressourcen

http://www.icml-2011.org/
(Inhaltsverzeichnis)

Volltexte (beschränkter Zugriff)
Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.
Volltexte (frei zugänglich)
Es sind keine frei zugänglichen Volltexte in PuRe verfügbar
Ergänzendes Material (frei zugänglich)
Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar
Zitation

Kpotufe, S., & von Luxburg, U. (2011). Pruning nearest neighbor cluster trees. In L. Getoor, & T. Scheffer (Eds.), 28th International Conference on Machine Learning (ICML 2011) (pp. 225-232). Madison, WI, USA: International Machine Learning Society.


Zitierlink: https://hdl.handle.net/11858/00-001M-0000-0013-BB2A-E
Zusammenfassung
Nearest neighbor (k-NN) graphs are widely used in machine learning and data mining applications, and our aim is to better understand what they reveal about the cluster structure of the unknown underlying distribution of points. Moreover, is it possible to identify spurious structures that might arise due to sampling variability? Our first contribution is a statistical analysis that reveals how certain subgraphs of a k-NN graph form a consistent estimator of the cluster tree of the underlying distribution of points. Our second and perhaps most important contribution is the following finite sample guarantee. We carefully work out the tradeoff between aggressive and conservative pruning and are able to guarantee the removal of all spurious cluster structures while at the same time guaranteeing the recovery of salient clusters. This is the first such finite sample result in the context of clustering.