Hilfe Datenschutzhinweis Impressum





Managing historical linguistic data for computational phylogenetics and computer-assisted language comparison


Tresoldi,  Tiago
CALC, Max Planck Institute for the Science of Human History, Max Planck Society;
Department of Linguistic and Cultural Evolution, Max Planck Institute for Evolutionary Anthropology, Max Planck Society;


Rzymski,  Christoph       
Department of Linguistic and Cultural Evolution, Max Planck Institute for Evolutionary Anthropology, Max Planck Society;


Forkel,  Robert       
Department of Linguistic and Cultural Evolution, Max Planck Institute for Evolutionary Anthropology, Max Planck Society;


Greenhill,  Simon J.       
Department of Linguistic and Cultural Evolution, Max Planck Institute for Evolutionary Anthropology, Max Planck Society;


List,  Johann-Mattis       
CALC, Max Planck Institute for the Science of Human History, Max Planck Society;
Department of Linguistic and Cultural Evolution, Max Planck Institute for Evolutionary Anthropology, Max Planck Society;


Gray,  Russell D.       
Department of Linguistic and Cultural Evolution, Max Planck Institute for Evolutionary Anthropology, Max Planck Society;

Volltexte (beschränkter Zugriff)
Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.
Volltexte (frei zugänglich)
Ergänzendes Material (frei zugänglich)
Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Tresoldi, T., Rzymski, C., Forkel, R., Greenhill, S. J., List, J.-M., & Gray, R. D. (2022). Managing historical linguistic data for computational phylogenetics and computer-assisted language comparison. In A. L. Berez-Kroeker, B. McDonnel, & E. Koller (Eds.), The open handbook of linguistic data management (pp. 345-354). Massachusetts: The MIT Press.

The popularisation of computer-based methods in comparative linguistics has led to a greater awareness of issues resulting from limited data sustainability and proper data management. In this use-case and its accompanying tutorial, we present principles of data management as applied to computational phylogenetics and computer-assisted language comparison, showcasing the solutions we recommend. Instead of enumerating the many possibilities to code and use linguistic data to conduct a phylogenetic analysis, we illustrate our suggestions for phylogenetic data management in a workflow based on a concrete analysis, showing how data should be managed with the help of a published dataset, exploring the information, file formats, processes, and software involved, explaining and showing how to collect and store cross-linguistic information, how to guarantee that datasets are cross-linguistically comparable, how to store intermediate and final results of the analyses, and how to share data in a reusable form by relying in the tools and principles of the CLDF initiative.