English
 
User Manual Privacy Policy Disclaimer Contact us
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Conference Paper

Automatic annotation of bibliographical references for descriptive language materials

MPS-Authors
/persons/resource/persons72724

Hammarström,  Harald
Department of Linguistics, Max Planck Institute for Evolutionary Anthropology, Max Planck Society;

Locator
There are no locators available
Fulltext (public)
Supplementary Material (public)
There is no public supplementary material available
Citation

Hammarström, H. (2011). Automatic annotation of bibliographical references for descriptive language materials. In P. Forner, J. Kekäläinen, M. Lalmas, & M. De Rijke (Eds.), Multilingual and multimodal information access evaluation. Second International Conference of the Cross-Language Evaluation Forum, CLEF 2011, Amsterdam, The Netherlands, September 19-22, 2011; Proceedings (pp. 62-73). Berlin: Springer.


Cite as: http://hdl.handle.net/11858/00-001M-0000-0013-78D8-8
Abstract
The present paper considers the problem of annotating bibliographical references with labels/classes, given training data of references already annotated with labels. The problem is an instance of document categorization where the documents are short and written in a wide variety of languages. The skewed distributions of title words and labels calls for special carefulness when choosing a Machine Learning approach. The present paper describes how to induce Disjunctive Normal Form formulae (DNFs), which have several advantages over Decision Trees. The approach is evaluated on a large real-world collection of bibliographical references.