MP-RNA: Unleashing Multi-species RNA Foundation Model via Calibrated Secondary Structure Prediction

Heng Yang, Ke Li


Abstract
RNA foundation models (FMs) have been extensively used to interpret genomic sequences and address a wide range of in-silico genomic tasks. However, current RNA FMs often overlook the incorporation of secondary structures in the pretraining of FMs, which impedes the effectiveness in various genomic tasks. To address this problem, we leverage filtered high-fidelity structure annotations for structure pretraining to enhance the modeling ability of FMs in single nucleotide resolution tasks. Experimental evaluations across four comprehensive genomic benchmarks demonstrate that our RNA FM consistently outperforms existing RNA FMs, achieving a 40% improvement in RNA secondary structure prediction and obtaining top-tier results on DNA genomic benchmarks even though it has not been pretrained on any DNA genome. We release the code and models to encourage further research to bridge the gap between in-silico predictions and biological reality.
Anthology ID:
2024.findings-emnlp.304
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5278–5296
Language:
URL:
https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-emnlp.304/
DOI:
10.18653/v1/2024.findings-emnlp.304
Bibkey:
Cite (ACL):
Heng Yang and Ke Li. 2024. MP-RNA: Unleashing Multi-species RNA Foundation Model via Calibrated Secondary Structure Prediction. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 5278–5296, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
MP-RNA: Unleashing Multi-species RNA Foundation Model via Calibrated Secondary Structure Prediction (Yang & Li, Findings 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-emnlp.304.pdf