Help Privacy Policy Disclaimer
  Advanced SearchBrowse




Journal Article

Correcting a bias in TIGER rates resulting from high amounts of invariant and singleton cognate sets


List,  J. M.
Department of Linguistic and Cultural Evolution, Max Planck Institute for Evolutionary Anthropology, Max Planck Society;
CALC, Max Planck Institute for the Science of Human History, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)

(Publisher version), 395KB

Supplementary Material (public)

(Supplementary material), 5MB


List, J. M. (2022). Correcting a bias in TIGER rates resulting from high amounts of invariant and singleton cognate sets. Journal of Language Evolution. doi:10.1093/jole/lzab007.

Cite as: https://hdl.handle.net/21.11116/0000-000A-01BF-E
In a recent issue of the Journal of Language Evolution, Syrja ̈ nen et al. (2021) investigate the suitability
of computing Cummins and McInerney’s (2011) TIGER rates for estimating the tree-likeness of linguis-
tic datasets compiled for phylogenetic reconstruction. The authors test the TIGER rates on a diverse sample of simulated data, which by and large confirms the usefulness of TIGER rates as an analytic tool for investigating linguistic data, but they test them only on one real-world dataset of Uralic languages which turns out to behave quite differently from the simulated data. When testing the TIGER rates on additional datasets, I detected a bias in the computation which leads to an unnatural increase in those cases where a dataset contains many characters with invariant or singleton states. To overcome this problem, I suggest a modified variant of TIGER rates, which is provided in the form of a freely available Python package. Testing the modified TIGER scores on the simulated data of Syrja ̈ nen et al. shows that the corrected TIGER rates still readily distinguish between different degrees of tree-likeness. Testing them on a dataset in which the number of singletons and invariants was artificially increased further shows that the corrected TIGER rates are not influenced by the bias. A final tests on seven linguistic datasets show the usefulness of the corrected TIGER rates on a larger variety of linguistic datasets and illustrate the importance to take specific aspects of linguistic data into account when using biological methods in the domain of language evolution.