Multi-task learning for pKa prediction

Skolidis, Grigorios; Hansen, Katja; Sanguinetti, Guido; Rupp, Matthias

doi:10.1007/s10822-012-9582-x

Local TagsRelease HistoryDetailsSummary

Multi-task learning for pK_a prediction

Skolidis, G., Hansen, K., Sanguinetti, G., & Rupp, M. (2012). Multi-task learning for pK_a prediction. Journal of Computer-Aided Molecular Design, 26(7), 883-895. doi:10.1007/s10822-012-9582-x.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/11858/00-001M-0000-0010-76DA-7 Version Permalink: https://hdl.handle.net/11858/00-001M-0000-0010-76DC-3

Genre: Journal Article

Files

show Files

Locators

show

Creators

show

hide

Creators:
Skolidis, Grigorios¹, Author
Hansen, Katja^{2, 3}, Author
Sanguinetti, Guido ⁴, Author
Rupp, Matthias^{3, 5}, Author

Affiliations:
1Department of Statistical Science, University College London,, Gower Street, London WC1E 6BT, UK, ou_persistent22
2Theory, Fritz Haber Institute, Max Planck Society, Faradayweg 4-6, 14195 Berlin, DE, ou_634547
3Machine Learning Group, TU Berlin, Franklinstr. 28/29, 10587 Berlin, Germany, ou_persistent22
4School of Informatics, University of Edinburgh,, 10 Crichton Street, EH8 9AB Edinburgh, Scotland, ou_persistent22
5Institute of Pharmaceutical Sciences, ETH Zurich,, Wolfgang-Pauli-Str. 10, 8093 Zürich, Switzerland, ou_persistent22

Content

show

hide

Free keywords: -

Abstract: Many compound properties depend directly on the dissociation constants of its acidic and basic groups. Significant effort has been invested in computational models to predict these constants. For linear regression models, compounds are often divided into chemically motivated classes, with a separate model for each class. However, sometimes too few measurements are available for a class to build a reasonable model, e.g., when investigating a new compound series. If data for related classes are available, we show that multi-task learning can be used to improve predictions by utilizing data from these other classes. We investigate performance of linear Gaussian process regression models (single task, pooling, and multitask models) in the low sample size regime, using a published data set (n = 698, mostly monoprotic, in aqueous solution) divided beforehand into 15 classes. A multi-task regression model using the intrinsic model of co-regionalization and incomplete Cholesky decomposition performed best in 85 % of all experiments. The presented approach can be applied to estimate other molecular properties where few measurements are available.

Details

show

hide

Language(s): eng - English

Dates: Submitted: 2011-11-15Accepted: 2012-05-11Published Online: 2012-06-20

Publication Status: Published online

Pages: 13

Publishing info: -

Table of Contents: -

Rev. Type: Peer

Identifiers: DOI: 10.1007/s10822-012-9582-x

Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show

hide

Title: Journal of Computer-Aided Molecular Design

Source Genre: Journal

Creator(s):

Affiliations:

Publ. Info: Leiden, The Netherlands : ESCOM Science Publishers

Pages: - Volume / Issue: 26 (7) Sequence Number: - Start / End Page: 883 - 895 Identifier: ISSN: 0920-654X
CoNE: https://pure.mpg.de/cone/journals/resource/954925564670