English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Meeting Abstract

Beware of circularity: A critical assessment of the state of the art in deleteriousness prediction of missense variants

MPS-Authors
/persons/resource/persons118781

Grimm,  D
Department Molecular Biology, Max Planck Institute for Developmental Biology, Max Planck Society;

/persons/resource/persons75313

Borgwardt,  K
Department Molecular Biology, Max Planck Institute for Developmental Biology, Max Planck Society;

Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Azencott, C., Grimm, D., Smoller, J., Duncan, L., & Borgwardt, K. (2014). Beware of circularity: A critical assessment of the state of the art in deleteriousness prediction of missense variants. In 64th Annual Meeting of the American Society of Human Genetics (ASHG 2014) (pp. 56).


Cite as: https://hdl.handle.net/21.11116/0000-000B-40FB-2
Abstract
Discrimination between disease-causing missense mutations and neutral
polymorphisms is a key challenge in current sequencing studies. It is there-
fore critical to be able to evaluate fairly and without bias the performance
of the many in silico predictors of deleteriousness. However, current analy-
ses of such tools and their combinations are liable to suffer from the effects
of circularity, which occurs when predictors are evaluated on data that are
not independent from those that were used to build them, and may lead to
overly optimistic results. Circularity can first stem from the overlap between
training and evaluation datasets, which may result in the well-studied phe-
nomenon of overfitting: a tool that is too tailored to a given dataset will be
more likely than others to perform well on that set, but incurs the risk of
failing more heavily at classifying novel variants. Second, we find that circu-
larity may result from an investigation bias in the way mutation databases
are populated: in most cases, all the variants of the same protein are anno-
tated with the same (neutral or pathogenic) status. Furthermore, proteins
containing only deleterious SNVs comprise many more labeled variants
than their counterparts containing only neutral SNVs. Ignoring this, we find
that assigning a variant the same status as that of its closest variant on
the genomic sequence outperforms all state-of-the-art tools. Given these
barriers to valid assessment of the performance of deleteriousness predic-
tion tools, we employ approaches that avoid circularity, and hence provide
independent evaluation of ten state-of-the-art tools and their combinations.
Our detailed analysis provides scientists with critical insights to guide their
choice of tool as well as the future development of new methods for deleter-
iousness prediction. In particular, we demonstrate that the performance of
FatHMM-W relies mostly on the knowledge of the labels of neighboring
variants, which may hinder its ability to annotate variants in the less explored
regions of the genome. We also find that PolyPhen2 performs as well or
better than all other tools at discriminating between cases and controls in
a novel autism-relevant dataset. Based on our findings about the mutation
databases available for training deleteriousness prediction tools, we predict
that retraining PolyPhen2 features on the Varibench dataset will yield even
better performance, and we show that this is true for the autism-relevant dataset.