English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Journal Article

Making genealogical language classifications available for phylogenetic analysis: Newick trees, unified identifiers, and branch length

MPS-Authors
/persons/resource/persons34

Dediu,  Dan
Language and Genetics Department, MPI for Psycholinguistics, Max Planck Society;

External Resource

https://github.com/ddediu/lgfam-newick
(Supplementary material)

Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)

Dediu_2018_Making_genealogical.pdf
(Publisher version), 767KB

Supplementary Material (public)
There is no public supplementary material available
Citation

Dediu, D. (2018). Making genealogical language classifications available for phylogenetic analysis: Newick trees, unified identifiers, and branch length. Language Dynamics and Change, 8(1), 1-21. doi:10.1163/22105832-00801001.


Cite as: https://hdl.handle.net/11858/00-001M-0000-002D-6219-1
Abstract
One of the best-known types of non-independence between languages is caused by genealogical relationships due to descent from a common ancestor. These can be represented by (more or less resolved and controversial) language family trees. In theory, one can argue that language families should be built through the strict application of the comparative method of historical linguistics, but in practice this is not always the case, and there are several proposed classifications of languages into language families, each with its own advantages and disadvantages. A major stumbling block shared by most of them is that they are relatively difficult to use with computational methods, and in particular with phylogenetics. This is due to their lack of standardization, coupled with the general non-availability of branch length information, which encapsulates the amount of evolution taking place on the family tree. In this paper I introduce a method (and its implementation in R) that converts the language classifications provided by four widely-used databases (Ethnologue, WALS, AUTOTYP and Glottolog) intothe de facto Newick standard generally used in phylogenetics, aligns the four most used conventions for unique identifiers of linguistic entities (ISO 639-3, WALS, AUTOTYP and Glottocode), and adds branch length information from a variety of sources (the tree's own topology, an externally given numeric constant, or a distance matrix). The R scripts, input data and resulting Newick trees are available under liberal open-source licenses in a GitHub repository (https://github.com/ddediu/lgfam-newick), to encourage and promote the use of phylogenetic methods to investigate linguistic diversity and its temporal dynamics.