hide
Free keywords:
-
Abstract:
The recent surge in applications of machine-learning (ML) algorithms to material science has shown its potential of predicting various properties for the majority of materials inside a given data set. One central aspect of physics however is lost in this approach: Determining the range of validity and thus limitations of the deduced models, which in materials science corresponds to extracting average and extreme representatives.
By combining clustering, variational autoencoders, and supervised ML algorithms, this work aims to find these two types of representatives and explore the following aspects: Does a given data set have a structure, or subsets of materials that follow different laws than others? Can the data set be reduced substantially, such that training the model still yields results of similar quality, and are there stable or unique data points whose inclusion during training is strictly necessary in order to obtain such a model? How can we estimate whether a new material of unknown target property is likely to be predicted well by our current best analytical model? By answering these questions, we intend to pave the way for a ML-driven search for the ’needle in the haystack’, with research targeted to promising new materials whose investigated properties differ in the desired way from the rest.
This work is structured as follows: After recapitulating related work on representative data points and defining central terms that are used throughout the thesis, existing ML-algorithm building blocks are presented, whose combinations to Direct Approach and Iterative Approach are newly introduced in this work to answer our three core questions above. The designs of both approaches are presented subsequently alongside with respective results. A summary recapitulates the core findings and ideas for future research.