Free keywords:
tree balance index, Sackin index, Colless index, total cophenetic index, Yule model, uniform model, phylogenetics
Tree balance plays an important role in phylogenetics and other research areas, which is why several indices to measure tree balance have been introduced over the years. Nevertheless, a formal definition of what a balance index actually is and what makes it a useful measure of balance (or, in other cases, imbalance), has so far not been introduced in the literature. While the established indices all summarize the (im)balance of a tree in a single number, they vary in their definitions and underlying principles. It is the aim of the present manuscript to introduce formal definitions of balance and imbalance indices that classify desirable properties of such indices and to analyze and categorize established indices accordingly. In this regard, we review 19 established (im)balance indices from the literature, summarize their general, statistical and combinatorial properties (where known), prove numerous additional results and indicate directions for future research by making explicit open questions and gaps in the literature. We also prove that a few tree shape statistics that have been used to measure tree balance in the literature do not fulfill our definition of an (im)balance index, which might indicate that their properties are not as useful for practical purposes. Moreover, we show that five additional tree shape statistics from other contexts actually are tree (im)balance indices according to our definition. The manuscript is accompanied by the website containing fact sheets of the discussed indices. Moreover, we introduce the software package \verb|treebalance| implemented in R that can be used to calculate all indices discussed.