English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Conference Paper

VarClass: An open-source language identification tool for language varieties

MPS-Authors
/persons/resource/persons4454

Gebre,  Binyam Gebrekidan
The Language Archive, MPI for Psycholinguistics, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)

Zampieri_Gebre_2014.pdf
(Publisher version), 128KB

Supplementary Material (public)
There is no public supplementary material available
Citation

Zampieri, M., & Gebre, B. G. (2014). VarClass: An open-source language identification tool for language varieties. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, et al. (Eds.), Proceedings of LREC 2014: 9th International Conference on Language Resources and Evaluation (pp. 3305-3308).


Cite as: https://hdl.handle.net/11858/00-001M-0000-0024-3238-7
Abstract
This paper presents VarClass, an open-source tool for language identification available both to be downloaded as well as through a graphical user-friendly interface. The main difference of VarClass in comparison to other state-of-the-art language identification tools is its focus on language varieties. General purpose language identification tools do not take language varieties into account and our work aims to fill this gap. VarClass currently contains language models for over 27 languages in which 10 of them are language varieties. We report an average performance of over 90.5% accuracy in a challenging dataset. More language models will be included in the upcoming months