English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Conference Paper

Grambank’s typological advances support computational research on diverse languages

MPS-Authors

Haynie,  Hannah J.
Department of Linguistic and Cultural Evolution, Max Planck Institute for Evolutionary Anthropology, Max Planck Society;

Blasi,  Damián
Department of Linguistic and Cultural Evolution, Max Planck Institute for Evolutionary Anthropology, Max Planck Society;

/persons/resource/persons252257

Skirgård,  Hedvig       
Department of Linguistic and Cultural Evolution, Max Planck Institute for Evolutionary Anthropology, Max Planck Society;

/persons/resource/persons138255

Gray,  Russell D.       
Department of Linguistic and Cultural Evolution, Max Planck Institute for Evolutionary Anthropology, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Haynie, H. J., Blasi, D., Skirgård, H., Greenhill, S. J., Atkinson, Q. D., & Gray, R. D. (2023). Grambank’s typological advances support computational research on diverse languages. In L. Beinborn, K. Goswami, S. Muradoğlu, A. Sorokin, R. Kumar, A. Scherbakov, et al. (Eds.), The 5th workshop on research in computational linguistic typology and multilingual NLP: proceedings of the workshop (pp. 147-149). Stroudsburg: Association for Computational Linguistics.


Cite as: https://hdl.handle.net/21.11116/0000-000D-1213-9
Abstract
Sound correspondence patterns form the basis of cognate detection and phonological reconstruction in historical language comparison. Methods for the automatic inference of correspondence patterns from phonetically aligned cognate sets have been proposed, but their application to multilingual wordlists requires extremely well annotated datasets. Since annotation is tedious and time consuming, it would be desirable to find ways to improve aligned cognate data automatically. Taking inspiration from trimming techniques in evolutionary biology, which improve alignments by excluding problematic sites, we propose a workflow that trims phonetic alignments in comparative linguistics prior to the inference of correspondence patterns. Testing these techniques on a large standardized collection of ten datasets with expert annotations from different language families, we find that the best trimming technique substantially improves the overall consistency of the alignments, showing a clear increase in the proportion of frequent correspondence patterns and words exhibiting regular cognate relations.