@inproceedings{yang-li-2024-mp,
    title = "{MP}-{RNA}: Unleashing Multi-species {RNA} Foundation Model via Calibrated Secondary Structure Prediction",
    author = "Yang, Heng  and
      Li, Ke",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2024.findings-emnlp.304/",
    doi = "10.18653/v1/2024.findings-emnlp.304",
    pages = "5278--5296",
    abstract = "RNA foundation models (FMs) have been extensively used to interpret genomic sequences and address a wide range of in-silico genomic tasks. However, current RNA FMs often overlook the incorporation of secondary structures in the pretraining of FMs, which impedes the effectiveness in various genomic tasks. To address this problem, we leverage filtered high-fidelity structure annotations for structure pretraining to enhance the modeling ability of FMs in single nucleotide resolution tasks. Experimental evaluations across four comprehensive genomic benchmarks demonstrate that our RNA FM consistently outperforms existing RNA FMs, achieving a 40{\%} improvement in RNA secondary structure prediction and obtaining top-tier results on DNA genomic benchmarks even though it has not been pretrained on any DNA genome. We release the code and models to encourage further research to bridge the gap between in-silico predictions and biological reality."
}Markdown (Informal)
[MP-RNA: Unleashing Multi-species RNA Foundation Model via Calibrated Secondary Structure Prediction](https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2024.findings-emnlp.304/) (Yang & Li, Findings 2024)
ACL