ausblenden:
Schlagwörter:
-
Zusammenfassung:
The evolution of drug resistance in HIV is characterized by the accumulation of
resistance-associated mutations in the HIV genome. Mutagenetic trees, a family
of restricted Bayesian tree models, have been applied to infer the order and
rate of occurrence of these mutations. Understanding and predicting this
evolutionary process is an important prerequisite for the rational design of
antiretroviral therapies. In practice, mixtures models of K mutagenetic trees
provide more flexibility and are often more appropriate for modelling observed
mutational patterns.
Here, we investigate the model selection problem for K-mutagenetic trees
mixture models. We evaluate several classical model selection criteria
including cross-validation, the Bayesian Information Criterion (BIC), and the
Akaike Information Criterion. We also use the empirical Bayes method by
constructing a prior probability distribution for the parameters of a
mutagenetic trees mixture model and deriving the posterior probability of the
model. In addition to the model dimension, we consider the redundancy of a
mixture model, which is measured by comparing the topologies of trees within a
mixture model. Based on the redundancy, we propose a new model selection
criterion, which is a modification of the BIC.
Experimental results on simulated and on real HIV data show that the classical
criteria tend to select models with far too many tree components. Only
cross-validation and the modified BIC recover the correct number of trees and
the tree topologies most of the time. At the same optimal performance, the
runtime of the new BIC modification is about one order of magnitude lower.
Thus, this model selection criterion can also be used for large data sets for
which cross-validation becomes computationally infeasible.