Help Privacy Policy Disclaimer
  Advanced SearchBrowse





Identifying exceptional data points in materials science using machine learning


Oehlers,  Melina
NOMAD, Fritz Haber Institute, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
Supplementary Material (public)
There is no public supplementary material available

Oehlers, M. (2021). Identifying exceptional data points in materials science using machine learning. Master Thesis, Technische Universität, Berlin.

Cite as: https://hdl.handle.net/21.11116/0000-000A-FBFE-E
The recent surge in applications of machine-learning (ML) algorithms to material science has shown its potential of predicting various properties for the majority of materials inside a given data set. One central aspect of physics however is lost in this approach: Determining the range of validity and thus limitations of the deduced models, which in materials science corresponds to extracting average and extreme representatives.
By combining clustering, variational autoencoders, and supervised ML algorithms, this work aims to find these two types of representatives and explore the following aspects: Does a given data set have a structure, or subsets of materials that follow different laws than others? Can the data set be reduced substantially, such that training the model still yields results of similar quality, and are there stable or unique data points whose inclusion during training is strictly necessary in order to obtain such a model? How can we estimate whether a new material of unknown target property is likely to be predicted well by our current best analytical model? By answering these questions, we intend to pave the way for a ML-driven search for the ’needle in the haystack’, with research targeted to promising new materials whose investigated properties differ in the desired way from the rest.
This work is structured as follows: After recapitulating related work on representative data points and defining central terms that are used throughout the thesis, existing ML-algorithm building blocks are presented, whose combinations to Direct Approach and Iterative Approach are newly introduced in this work to answer our three core questions above. The designs of both approaches are presented subsequently alongside with respective results. A summary recapitulates the core findings and ideas for future research.