Opening the romance verbal inflection dataset 2.0: a CLDF lexicon

Beniamine, Sacha; Maiden, Martin; Round, Erich

doi:10.5281/zenodo.3611076

アイテム詳細

登録内容を編集ファイル形式で保存

一時保存へ追加

タグ情報を表示リリース履歴を表示詳細要約

公開

会議論文

Opening the romance verbal inflection dataset 2.0: a CLDF lexicon

MPS-Authors

/persons/resource/persons247724

Beniamine, Sacha
Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Max Planck Society;

/persons/resource/persons247722

Round, Erich
Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Max Planck Society;

External Resource

There are no locators available

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

フルテキスト (公開)

shh2602.pdf
(出版社版), 389KB

付随資料 (公開)

There is no public supplementary material available

引用

Beniamine, S., Maiden, M., & Round, E. (2020). Opening the romance verbal inflection dataset 2.0: a CLDF lexicon. In N., Calzolari, F., Béchet, P., Blache, K., Choukri, C., Cieri, T., Declerck, S., Goggi, H., Ishara, B., Maegaard, H. M., Mariani, A., Moreno, J., Odijk, & S., Piperidis (Eds.), Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020) (pp. 3027-3035). Paris: European Language Resources Association (ELRA). doi:10.5281/zenodo.3611076.

引用: https://hdl.handle.net/21.11116/0000-0006-6E8B-3

要旨

We introduce the Romance Verbal Inflection Dataset 2.0, a multilingual lexicon of Romance inflection covering 74 varieties. The lexicon provides verbal paradigm forms in broad IPA phonemic notation. Both lexemes and paradigm cells are organized to reflect cognacy. Such multi-lingual inflected lexicons annotated for two dimensions of cognacy are necessary to study the evolution of inflectional
paradigms, and test linguistic hypotheses systematically. However, these resources seldom exist, and when they do, they are not usually encoded in computationally usable ways. The Oxford Online Database of Romance Verb Morphology provides this kind of information, however, it is not maintained anymore and is only available as a web service without interfaces for machine-readability. We collect its data and clean and correct it for consistency using both heuristics and expert annotator judgements. Most resources used to study language evolution computationally rely strictly on multilingual contemporary information, and lack information about prior stages of the languages. To provide such information, we augmented the database with Latin paradigms from the LatInFlexi lexicon. Finally, to make it widely avalable, the resource is released under a GPLv3 license in CLDF format.