English
 
User Manual Privacy Policy Disclaimer Contact us
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Nearest Neighbor Clustering: A Baseline Method for Consistent Clustering with Arbitrary Objective Functions

Bubeck, S., & von Luxburg, U. (2009). Nearest Neighbor Clustering: A Baseline Method for Consistent Clustering with Arbitrary Objective Functions. The Journal of Machine Learning Research, 10, 657-698.

Item is

Basic

show hide
Item Permalink: http://hdl.handle.net/11858/00-001M-0000-0013-C591-8 Version Permalink: http://hdl.handle.net/21.11116/0000-0002-C9A4-2
Genre: Journal Article

Files

show Files

Locators

show
hide
Description:
-

Creators

show
hide
 Creators:
Bubeck, S, Author              
von Luxburg, U1, 2, Author              
Affiliations:
1Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_1497795              
2Max Planck Institute for Biological Cybernetics, Max Planck Society, Spemannstrasse 38, 72076 Tübingen, DE, ou_1497794              

Content

show
hide
Free keywords: -
 Abstract: Clustering is often formulated as a discrete optimization problem. The objective is to find, among all partitions of the data set, the best one according to some quality measure. However, in the statistical setting where we assume that the finite data set has been sampled from some underlying space, the goal is not to find the best partition of the given sample, but to approximate the true partition of the underlying space. We argue that the discrete optimization approach usually does not achieve this goal, and instead can lead to inconsistency. We construct examples which provably have this behavior. As in the case of supervised learning, the cure is to restrict the size of the function classes under consideration. For appropriate “small” function classes we can prove very general consistency theorems for clustering optimization schemes. As one particular algorithm for clustering with a restricted function space we introduce “nearest neighbor clustering”. Similar to the k-nearest neighbor classifier in supervised learning, this algorithm can be seen as a general baseline algorithm to minimize arbitrary clustering objective functions. We prove that it is statistically consistent for all commonly used clustering objective functions.

Details

show
hide
Language(s):
 Dates: 2009-03
 Publication Status: Published in print
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: BibTex Citekey: 5687
 Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show
hide
Title: The Journal of Machine Learning Research
Source Genre: Journal
 Creator(s):
Affiliations:
Publ. Info: Cambridge, MA : MIT Press
Pages: - Volume / Issue: 10 Sequence Number: - Start / End Page: 657 - 698 Identifier: ISSN: 1532-4435
CoNE: https://pure.mpg.de/cone/journals/resource/111002212682020_1