Transfer Learning Allows Accurate RBP Target Site Prediction with Limited 
Sample Sizes

Vaculík, Ondřej; Chalupová, Eliška; Grešová, Katarína; Majtner, Tomáš; Alexiou, Panagiotis

doi:10.3390/biology12101276

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Journal Article

Transfer Learning Allows Accurate RBP Target Site Prediction with Limited Sample Sizes

MPS-Authors

/persons/resource/persons293277

Majtner, Tomáš
Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czech Republic;
Department of Molecular Sociology, Max Planck Institute of Biophysics, Max Planck Society;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

biology-12-01276.pdf
(Any fulltext), 3MB

Supplementary Material (public)

There is no public supplementary material available

Citation

Vaculík, O., Chalupová, E., Grešová, K., Majtner, T., & Alexiou, P. (2023). Transfer Learning Allows Accurate RBP Target Site Prediction with Limited Sample Sizes. Biology, 12(10): 1276. doi:10.3390/biology12101276.

Cite as: https://hdl.handle.net/21.11116/0000-000D-D9EA-7

Abstract

RNA-binding proteins are vital regulators in numerous biological processes. Their disfunction can result in diverse diseases, such as cancer or neurodegenerative disorders, making the prediction of their binding sites of high importance. Deep learning (DL) has brought about a revolution in various biological domains, including the field of protein–RNA interactions. Nonetheless, several challenges persist, such as the limited availability of experimentally validated binding sites to train well-performing DL models for the majority of proteins. Here, we present a novel training approach based on transfer learning (TL) to address the issue of limited data. Employing a sophisticated and interpretable architecture, we compare the performance of our method trained using two distinct approaches: training from scratch (SCR) and utilizing TL. Additionally, we benchmark our results against the current state-of-the-art methods. Furthermore, we tackle the challenges associated with selecting appropriate input features and determining optimal interval sizes. Our results show that TL enhances model performance, particularly in datasets with minimal training data, where satisfactory results can be achieved with just a few hundred RNA binding sites. Moreover, we demonstrate that integrating both sequence and evolutionary conservation information leads to superior performance. Additionally, we showcase how incorporating an attention layer into the model facilitates the interpretation of predictions within a biologically relevant context.