English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Journal Article

Lexibank, a public repository of standardized wordlists with computed phonological and lexical features

MPS-Authors
/persons/resource/persons201886

List,  Johann-Mattis       
Department of Linguistic and Cultural Evolution, Max Planck Institute for Evolutionary Anthropology, Max Planck Society;

/persons/resource/persons96313

Forkel,  Robert       
Department of Linguistic and Cultural Evolution, Max Planck Institute for Evolutionary Anthropology, Max Planck Society;

/persons/resource/persons185771

Greenhill,  Simon J.       
Department of Linguistic and Cultural Evolution, Max Planck Institute for Evolutionary Anthropology, Max Planck Society;

/persons/resource/persons222944

Rzymski,  Christoph       
Department of Linguistic and Cultural Evolution, Max Planck Institute for Evolutionary Anthropology, Max Planck Society;

/persons/resource/persons267709

Englisch,  Johannes       
Department of Linguistic and Cultural Evolution, Max Planck Institute for Evolutionary Anthropology, Max Planck Society;

/persons/resource/persons138255

Gray,  Russell D.       
Department of Linguistic and Cultural Evolution, Max Planck Institute for Evolutionary Anthropology, Max Planck Society;

Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)

List_Lexibank_SciData_2022.pdf
(Publisher version), 10MB

Supplementary Material (public)
There is no public supplementary material available
Citation

List, J.-M., Forkel, R., Greenhill, S. J., Rzymski, C., Englisch, J., & Gray, R. D. (2022). Lexibank, a public repository of standardized wordlists with computed phonological and lexical features. Scientific Data, 9: 316. doi:10.1038/s41597-022-01432-0.


Cite as: https://hdl.handle.net/21.11116/0000-000A-9A5D-1
Abstract
The past decades have seen substantial growth in digital data on the world’s languages. At the same time, the demand for cross-linguistic datasets has been increasing, as witnessed by numerous studies devoted to diverse questions on human prehistory, cultural evolution, and human cognition. Unfortunately, most published datasets lack standardization which makes their comparison difficult. Here, we present a new approach to increase the comparability of cross-linguistic lexical data. We have designed workflows for the computer-assisted lifting of datasets to Cross-Linguistic Data Formats, a collection of standards that make these datasets more Findable, Accessible, Interoperable, and Reusable (FAIR). We test the Lexibank workflow on 100 lexical datasets from which we derive an aggregated database of wordlists in unified phonetic transcriptions covering more than 2000 language varieties. We illustrate the benefits of our approach by showing how phonological and lexical features can be automatically inferred, complementing and expanding existing cross-linguistic datasets.