English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Conference Paper

Restrictive Clustering and Metaclustering for self-organizing Document Collections

MPS-Authors
/persons/resource/persons45482

Siersdorfer,  Stefan
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45500

Sizov,  Sergej
Databases and Information Systems, MPI for Informatics, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Siersdorfer, S., & Sizov, S. (2004). Restrictive Clustering and Metaclustering for self-organizing Document Collections. In Proceedings of SIGIR 2004: the Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 226-233). New York, USA: ACM.


Cite as: https://hdl.handle.net/11858/00-001M-0000-000F-2B27-F
Abstract
This paper addresses the problem of automatically structuring heterogenous document collections by using clustering methods. In contrast to traditional clustering, we study restrictive methods and ensemble-based meta methods that may decide to leave out some documents rather than assigning them to inappropriate clusters with low confidence. These techniques result in higher cluster purity, better overall accuracy, and make unsupervised self-organization more robust. Our comprehensive experimental studies on three different real-world data collections demonstrate these benefits. The proposed methods seem particularly suitable for automatically substructuring personal email folders or personal Web directories that are populated by focused crawlers, and they can be combined with supervised classification techniques.