English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Journal Article

Crowd-sourcing materials-science challenges with the NOMAD 2018 Kaggle competition

MPS-Authors
/persons/resource/persons183305

Sutton,  Christopher A.
NOMAD, Fritz Haber Institute, Max Planck Society;

/persons/resource/persons21549

Ghiringhelli,  Luca M.       
NOMAD, Fritz Haber Institute, Max Planck Society;

/persons/resource/persons247701

Liu,  Xiangyue       
NOMAD, Fritz Haber Institute, Max Planck Society;

/persons/resource/persons192341

Ziletti,  Angelo
NOMAD, Fritz Haber Institute, Max Planck Society;

/persons/resource/persons22064

Scheffler,  Matthias       
NOMAD, Fritz Haber Institute, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)

arXiv:1812.00085.pdf
(Preprint), 5MB

s41524-019-0239-3.pdf
(Publisher version), 2MB

Supplementary Material (public)

41524_2019_239_MOESM1_ESM.pdf
(Supplementary material), 2MB

Citation

Sutton, C. A., Ghiringhelli, L. M., Yamamoto, T., Lysogorskiy, Y., Blumenthal, L., Hammerschmidt, T., et al. (2019). Crowd-sourcing materials-science challenges with the NOMAD 2018 Kaggle competition. npj Computational Materials, 5: 111. doi:10.1038/s41524-019-0239-3.


Cite as: https://hdl.handle.net/21.11116/0000-0003-02D1-E
Abstract
A public data-analytics competition was organized by the Novel Materials Discovery (NOMAD) Centre of Excellence and hosted by the online platform Kaggle by using a dataset of 3,000 (AlxGayIn1–x–y)2O3 compounds. Its aim was to identify the best machine-learning (ML) model for the prediction of two key physical properties that are relevant for optoelectronic applications: the electronic bandgap energy and the crystalline formation energy. Here, we present a summary of the top-three ranked ML approaches. The first-place solution was based on a crystal-graph representation that is novel for the ML of properties of materials. The second-place model combined many candidate descriptors from a set of compositional, atomic-environment-based, and average structural properties with the light gradient-boosting machine regression model. The third-place model employed the smooth overlap of atomic position representation with a neural network. The Pearson correlation among the prediction errors of nine ML models (obtained by combining the top-three ranked representations with all three employed regression models) was examined by using the Pearson correlation to gain insight into whether the representation or the regression model determines the overall model performance. Ensembling relatively decorrelated models (based on the Pearson correlation) leads to an even higher prediction accuracy