Identifying exceptional data points in materials science using machine learning

Oehlers, Melina

Local TagsRelease HistoryDetailsSummary

Identifying exceptional data points in materials science using machine learning

Oehlers, M. (2021). Identifying exceptional data points in materials science using machine learning. Master Thesis, Technische Universität, Berlin.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-000A-FBFE-E Version Permalink: https://hdl.handle.net/21.11116/0000-000C-7DA8-B

Genre: Thesis

Files

show Files

hide Files

:

OehlersMilena_master_thesis_sub.pdf (Any fulltext), 3MB

View Save

File Permalink:
https://hdl.handle.net/21.11116/0000-000A-FC00-A

Name:
OehlersMilena_master_thesis_sub.pdf

Description:
-

OA-Status:
Miscellaneous

Visibility:
Public

MIME-Type / Checksum:
application/pdf / [MD5]

Technical Metadata:

View

Copyright Date:
-

Copyright Info:
-

License:
-

Locators

show

Creators

show

hide

Creators:
Oehlers, Melina¹, Author
Knorr, Andreas, Referee
Scheffler, Matthias¹, Referee

Affiliations:
1NOMAD, Fritz Haber Institute, Max Planck Society, ou_3253022

Content

show

hide

Free keywords: -

Abstract: The recent surge in applications of machine-learning (ML) algorithms to material science has shown its potential of predicting various properties for the majority of materials inside a given data set. One central aspect of physics however is lost in this approach: Determining the range of validity and thus limitations of the deduced models, which in materials science corresponds to extracting average and extreme representatives.
By combining clustering, variational autoencoders, and supervised ML algorithms, this work aims to find these two types of representatives and explore the following aspects: Does a given data set have a structure, or subsets of materials that follow different laws than others? Can the data set be reduced substantially, such that training the model still yields results of similar quality, and are there stable or unique data points whose inclusion during training is strictly necessary in order to obtain such a model? How can we estimate whether a new material of unknown target property is likely to be predicted well by our current best analytical model? By answering these questions, we intend to pave the way for a ML-driven search for the ’needle in the haystack’, with research targeted to promising new materials whose investigated properties differ in the desired way from the rest.
This work is structured as follows: After recapitulating related work on representative data points and defining central terms that are used throughout the thesis, existing ML-algorithm building blocks are presented, whose combinations to Direct Approach and Iterative Approach are newly introduced in this work to answer our three core questions above. The designs of both approaches are presented subsequently alongside with respective results. A summary recapitulates the core findings and ideas for future research.

Details

show

hide

Language(s): eng - English

Dates: Accepted: 2021

Publication Status: Accepted / In Press

Pages: V, 59

Publishing info: Berlin : Technische Universität

Table of Contents: -

Rev. Type: -

Identifiers: -

Degree: Master

Event

show

Legal Case

show

Project information

show

Source

show