要旨
For a systematic discovery of molecular crystal structures with customized properties, efficient search strategies are desired. Conveniently, these structure searches can be delegated to a machine by means of computational chemistry. Its tools allow to quantify the stability of an atomistic structure by computing its energy and experimentally observed crystals are associated with the most stable structural arrangements. As a consequence of this, successful in silico molecular crystal structure predictions (CSP) are associated with the solution to a global optimization problem. For a given molecule, reliable predictions of the most stable structural arrangements face a major challenge, though, which arises from the vast search spaces that need to be explored and the computationally expensive high levels of theory that need to be applied to resolve the typically small stability differences betweencrystal candidates.
To arrive at corresponding solutions in an efficient way structure-energy relationships need to be evaluated with both high accuracy and low computational costs. On that account, an approach has been developed in this work to generate accurate hybrid models for molecular crystals that feature short evaluation times. These hybrid models are composed of a computationally inexpensive physics-based description of long-range interactions at the density-functional tight-binding (DFTB) level and a short-range correction to reproduce highly accurate first-principles target methods (based on density-functional theory or wavefunction methods). The generation of the latter is achieved by a kernel-based supervised machine learning (ML) strategy developed to yield system-specific ∆-ML corrections that augment the DFTB baseline description.
Accounting for considerable computational costs associated with the evaluation of reference structures, the developed training procedure for ∆-ML models is characterized by a high data-efficiency. In this regard, the training benefits from the applied DFTB baseline as its description captures significant parts of interactions relevant to molecular crystals which circumvents the need to explicitly learn them from data. A diversity-driven selection of appropriate structures further reduces the number of required reference data representative for the intended application of the model to molecular CSP.
For single-component molecular crystals, the obtained hybrid models are shown to accurately reproduce the description of the high-level reference method at a fraction of the computational costs. Beyond that the models are differentiable which allows for efficient local structure optimization and, thus, gives rise to a significant reduction of the computationally most expensive part in typical molecular CSP studies. Conveniently, the approach has been shown to be broadly applicable to various types of single-component molecular crystals and corresponding interactions.
A developed extension of this approach provides a generalization to (neutral) multi-component crystals which are of great practical relevance for well-directed searches of materials featuring application-specific properties. In this context, the robustness of corresponding ∆-ML models is substantiated inter alia by performing molecular dynamics simulations at ambient conditions on co-crystal structures outside the scope of the reference structures used for their generation. Here, the obtained predictions of co-crystal densities have been verified by direct comparison with experimental measurements.
Apart from this, the versatile applicability of kernel-based unsupervised learning for gaining insights into data sets of atomistic structures and associated attributes has been illustrated. Here, atomic environments and entire structures have been described by a sophisticated representation while mutual relations between them have been measured and projected to a low-dimensional space by means of kernel principle component analysis. Tools for performing these mappings, subsequent visualization and interactive exploration are conveniently provided in course of the presented work along with illustrative examples to showcase various fields of application such as the analysis of molecular dynamics trajectories, the results of a crystal structure search or information associated with atomic environments in a well-established molecular database.