Discovering a Term Taxonomy from Term Similarities Using Principal Component 
Analysis

Bast, Holger; Dupret, Georges; Majumdar, Debapriyo; Piwowarski, Benjamin; Ackermann, Markus; Berendt, Bettina; Grobelnik, Marko; Hotho, Andreas; Mladenic, Dunja; Semeraro, Giovanni; Spiliopoulou, Myra; Stumme, Gerd; Svatek, Vojtech; van Someren, Maarten W.

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Conference Paper

Discovering a Term Taxonomy from Term Similarities Using Principal Component Analysis

MPS-Authors

/persons/resource/persons44076

Bast, Holger
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

/persons/resource/persons44356

Dupret, Georges
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

/persons/resource/persons44972

Majumdar, Debapriyo
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Bast, H., Dupret, G., Majumdar, D., & Piwowarski, B. (2006). Discovering a Term Taxonomy from Term Similarities Using Principal Component Analysis. In Semantics, web and mining : Joint International Workshops, EWMF 2005 and KDO 2005 (pp. 103-120). Berlin, Germany: Springer.

Cite as: https://hdl.handle.net/11858/00-001M-0000-000F-2295-E

Abstract

We show that eigenvector decomposition can be used to extract a term taxonomy from a given collection of text documents. So far, methods based on eigenvector decomposition, such as latent semantic indexing (LSI) or principal component analysis (PCA), were only known to be useful for extracting symmetric relations between terms. We give a precise mathematical criterion for distinguishing between four kinds of relations of a given pair of terms of a given collection: unrelated (car - fruit), symmetrically related (car - automobile), asymmetrically related with the first term being more specific than the second (banana - fruit), and asymmetrically related in the other direction (fruit - banana). We give theoretical evidence for the soundness of our criterion, by showing that in a simplified mathematical model the criterion does the apparently right thing. We applied our scheme to the reconstruction of a selected part of the open directory project (ODP) hierarchy, with promising results.