Identifying domains of applicability of machine learning models for materials 
science

Sutton, Christopher A.; Boley, Mario; Ghiringhelli, Luca M.; Rupp, Matthias; Vreeken, Jilles; Scheffler, Matthias

doi:10.26434/chemrxiv.9778670

Item

ITEM ACTIONSEXPORT

Add to Basket

Please note that a newer version of this item is available:
https://pure.mpg.de/pubman/item/item_3164134_9

DetailsSummary

Released

Journal Article

Identifying domains of applicability of machine learning models for materials science

MPS-Authors

/persons/resource/persons183305

Sutton, Christopher A.
NOMAD, Fritz Haber Institute, Max Planck Society;

/persons/resource/persons21549

Ghiringhelli, Luca M.
NOMAD, Fritz Haber Institute, Max Planck Society;

/persons/resource/persons173798

Rupp, Matthias
Citrine Informatics;
NOMAD, Fritz Haber Institute, Max Planck Society;

/persons/resource/persons22064

Scheffler, Matthias
Physics Department and IRIS Adlershof, Humboldt-Universität;
NOMAD, Fritz Haber Institute, Max Planck Society;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

manuscript.domain_of_app.pdf
(Preprint), 9MB

s41467-020-17112-9.pdf
(Publisher version), 6MB

Supplementary Material (public)

There is no public supplementary material available

Citation

Sutton, C. A., Boley, M., Ghiringhelli, L. M., Rupp, M., Vreeken, J., & Scheffler, M. (2020). Identifying domains of applicability of machine learning models for materials science. Nature Communications, 11: 4428. doi:10.26434/chemrxiv.9778670.

Cite as: https://hdl.handle.net/21.11116/0000-0004-AA7A-4

Abstract

We present an extension to the usual machine learning process that allows for the identification of the domain of applicability of a fitted model, i.e., the region in its domain where it performs most accurately. This approach is applied to several vastly different but commonly used materials representations (namely the n-gram approach, SOAP, and the many body tenor representation), which are practically indistinguishable based on performance using a single error statistic. Moreover, these models appear unsatisfactory for screening applications as they fail to reliably identify the ground state polymorphs. When applying our newly developed analysis for each of the models, we can identify the domain of applicability for each model according to a simple set of interpretable conditions. We show that identification of the domain of applicability in the prediction of the formation energy enables a more accurate ground-state search - a crucial step for the discovery of novel materials.