English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Journal Article

Large-scale DNA-based phenotypic recording and deep learning enable highly accurate sequence-function mapping

MPS-Authors
There are no MPG-Authors in the publication available
External Resource
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Höllerer, S., Papaxanthos, L., Gumpinger, A. C., Fischer, K., Beisel, C., Borgwardt, K., et al. (2020). Large-scale DNA-based phenotypic recording and deep learning enable highly accurate sequence-function mapping. Nature Communications, 11: 3551. doi:10.1038/s41467-020-17222-4.


Cite as: https://hdl.handle.net/21.11116/0000-000C-F089-A
Abstract
Predicting effects of gene regulatory elements (GREs) is a longstanding challenge in biology. Machine learning may address this, but requires large datasets linking GREs to their quantitative function. However, experimental methods to generate such datasets are either application-specific or technically complex and error-prone. Here, we introduce DNA-based phenotypic recording as a widely applicable, practicable approach to generate large-scale sequence-function datasets. We use a site-specific recombinase to directly record a GRE’s effect in DNA, enabling readout of both sequence and quantitative function for extremely large GRE-sets via next-generation sequencing. We record translation kinetics of over 300,000 bacterial ribosome binding sites (RBSs) in >2.7 million sequence-function pairs in a single experiment. Further, we introduce a deep learning approach employing ensembling and uncertainty modelling that predicts RBS function with high accuracy, outperforming state-of-the-art methods. DNA-based phenotypic recording combined with deep learning represents a major advance in our ability to predict function from genetic sequence.