Deutsch
 
Hilfe Datenschutzhinweis Impressum
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT
  MuLan-Methyl: Multiple Transformer-based Language Models for Accurate DNA Methylation Prediction

Zeng, W., Gautam, A., & Huson, D. (2023). MuLan-Methyl: Multiple Transformer-based Language Models for Accurate DNA Methylation Prediction. GigaScience, 12: giad054. doi:10.1093/gigascience/giad054.

Item is

Basisdaten

einblenden: ausblenden:
Genre: Zeitschriftenartikel

Externe Referenzen

einblenden:

Urheber

einblenden:
ausblenden:
 Urheber:
Zeng, W, Autor
Gautam, A1, Autor                 
Huson, DH1, Autor                 
Affiliations:
1IMPRS From Molecules to Organisms, Max Planck Institute for Biology Tübingen, Max Planck Society, ou_3376132              

Inhalt

einblenden:
ausblenden:
Schlagwörter: -
 Zusammenfassung: Transformer-based language models are successfully used to address massive text-related tasks. DNA methylation is an important epigenetic mechanism, and its analysis provides valuable insights into gene regulation and biomarker identification. Several deep learning-based methods have been proposed to identify DNA methylation, and each seeks to strike a balance between computational effort and accuracy. Here, we introduce MuLan-Methyl, a deep learning framework for predicting DNA methylation sites, which is based on 5 popular transformer-based language models. The framework identifies methylation sites for 3 different types of DNA methylation: N6-adenine, N4-cytosine, and 5-hydroxymethylcytosine. Each of the employed language models is adapted to the task using the "pretrain and fine-tune" paradigm. Pretraining is performed on a custom corpus of DNA fragments and taxonomy lineages using self-supervised learning. Fine-tuning aims at predicting the DNA methylation status of each type. The 5 models are used to collectively predict the DNA methylation status. We report excellent performance of MuLan-Methyl on a benchmark dataset. Moreover, we argue that the model captures characteristic differences between different species that are relevant for methylation. This work demonstrates that language models can be successfully adapted to applications in biological sequence analysis and that joint utilization of different language models improves model performance. Mulan-Methyl is open source, and we provide a web server that implements the approach.

Details

einblenden:
ausblenden:
Sprache(n):
 Datum: 2023-062023-07
 Publikationsstatus: Online veröffentlicht
 Seiten: -
 Ort, Verlag, Ausgabe: -
 Inhaltsverzeichnis: -
 Art der Begutachtung: -
 Identifikatoren: DOI: 10.1093/gigascience/giad054
PMID: 37489753
 Art des Abschluß: -

Veranstaltung

einblenden:

Entscheidung

einblenden:

Projektinformation

einblenden:

Quelle 1

einblenden:
ausblenden:
Titel: GigaScience
Genre der Quelle: Zeitschrift
 Urheber:
Affiliations:
Ort, Verlag, Ausgabe: Oxford : Oxford University Press
Seiten: 11 Band / Heft: 12 Artikelnummer: giad054 Start- / Endseite: - Identifikator: ISSN: 2047-217X
CoNE: https://pure.mpg.de/cone/journals/resource/2047-217X