User Manual Privacy Policy Disclaimer Contact us
  Advanced SearchBrowse




Conference Paper

Automatic annotation of bibliographical references for descriptive language materials


Hammarström,  Harald
Department of Linguistics, Max Planck Institute for Evolutionary Anthropology, Max Planck Society;

There are no locators available
Fulltext (public)
Supplementary Material (public)
There is no public supplementary material available

Hammarström, H. (2011). Automatic annotation of bibliographical references for descriptive language materials. In P. Forner, J. Kekäläinen, M. Lalmas, & M. De Rijke (Eds.), Multilingual and multimodal information access evaluation. Second International Conference of the Cross-Language Evaluation Forum, CLEF 2011, Amsterdam, The Netherlands, September 19-22, 2011; Proceedings (pp. 62-73). Berlin: Springer.

Cite as: http://hdl.handle.net/11858/00-001M-0000-0013-78D8-8
The present paper considers the problem of annotating bibliographical references with labels/classes, given training data of references already annotated with labels. The problem is an instance of document categorization where the documents are short and written in a wide variety of languages. The skewed distributions of title words and labels calls for special carefulness when choosing a Machine Learning approach. The present paper describes how to induce Disjunctive Normal Form formulae (DNFs), which have several advantages over Decision Trees. The approach is evaluated on a large real-world collection of bibliographical references.