Help Privacy Policy Disclaimer
  Advanced SearchBrowse




Journal Article

SISSO++: A C++ Implementation of the Sure-Independence Screening and Sparisifying Operator Approach


Purcell,  Thomas
NOMAD, Fritz Haber Institute, Max Planck Society;


Scheffler,  Matthias
NOMAD, Fritz Haber Institute, Max Planck Society;


Carbogno,  Christian
NOMAD, Fritz Haber Institute, Max Planck Society;


Ghiringhelli,  Luca M.
NOMAD, Fritz Haber Institute, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)

(Publisher version), 210KB

Supplementary Material (public)
There is no public supplementary material available

Purcell, T., Scheffler, M., Carbogno, C., & Ghiringhelli, L. M. (2022). SISSO++: A C++ Implementation of the Sure-Independence Screening and Sparisifying Operator Approach. The Journal of Open Source Software, 7(71): 3960. doi:10.21105/joss.03960.

Cite as: https://hdl.handle.net/21.11116/0000-0009-7950-5
The sure independence screening and sparsifying operator (SISSO) approach (Ouyang et al., 2018) is an algorithm belonging to the field of artificial intelligence and more specifically a combination of symbolic regression and compressed sensing. As a symbolic regression method, SISSO is used to identify mathematical functions, i.e. the descriptors, that best
predict the target property of a data set. Furthermore, the compressed sensing aspect of SISSO, allows it to find sparse linear models using tens to thousands of data points. SISSO is introduced for both regression and classification tasks. In practice, SISSO first constructs a large and exhaustive feature space of trillions of potential descriptors by taking in a set of
user-provided primary features as a dataframe, and then iteratively applying a set of unary and binary operators, e.g. addition, multiplication, exponentiation, and squaring, according to a user-defined specification. From this exhaustive pool of candidate descriptors, the ones most
correlated to a target property are identified via sure-independence screening, from which the low-dimensional linear models with the lowest error are found via an l0 regularization.
Because symbolic regression generates an interpretable equation, it has become an increasingly popular concept across scientific disciplines (Neumann et al., 2020; Udrescu & Tegmark, 2020; Wang et al., 2019). A particular advantage of these approaches are their capability to model complex phenomena using relatively simple descriptors. SISSO has been used successfully in the past to model, explore, and predict important material properties, including the stability of different phases (Bartel et al., 2018; Schleder et al., 2020); the catalytic activity and reactivity
(Andersen et al., 2019; Andersen & Reuter, 2021; Han et al., 2021; W. Xu et al., 2021); and glass transition temperatures (Pilania et al., 2019). Beyond regression problems, SISSO has also been used successfully to classify materials into different crystal prototypes (Ouyang et al., 2019), or whether a material crystallizes in its ground state as a perovskite (Bartel et al., 2019), or to determine whether a material is a topological insulator or not (Cao et al., 2020).
The SISSO++ package is an open-source (Apache-2.0 licence), modular, and extensible C++ implementation of the SISSO method with Python bindings. Specifically, SISSO++ applies this methodology for regression, log regression, and classification problems. Additionally, the
library includes multiple Python functions to facilitate the post-processing, analyzing, and
visualizing of the resulting models.