Cross-Modal Augmentation for Low-Resource Language Understanding and Generation

Zichao Li, Zong Ke


Abstract
This paper introduces a multimodal retrieval-augmented generation (RAG) system designed to enhance language understanding and generation for low-resource languages. By integrating textual, visual, and geospatial data, the system leverages cross-lingual adaptation and multimodal augmentation to bridge the gap between high-resource and low-resource languages. Evaluated on the MM-COVID and LORELEI datasets, the system demonstrates superior performance in retrieval (precision: 85%, recall: 82%) and generation (BLEU: 28.4) tasks compared to baselines. Case studies in public health communication and disaster response highlight its practical utility. The results underscore the potential of multimodal AI to democratize access to technology and address global challenges in low-resource settings.
Anthology ID:
2025.magmar-1.9
Volume:
Proceedings of the 1st Workshop on Multimodal Augmented Generation via Multimodal Retrieval (MAGMaR 2025)
Month:
August
Year:
2025
Address:
Vienna, Austria
Editors:
Reno Kriz, Kenton Murray
Venues:
MAGMaR | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
90–99
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.magmar-1.9/
DOI:
Bibkey:
Cite (ACL):
Zichao Li and Zong Ke. 2025. Cross-Modal Augmentation for Low-Resource Language Understanding and Generation. In Proceedings of the 1st Workshop on Multimodal Augmented Generation via Multimodal Retrieval (MAGMaR 2025), pages 90–99, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Cross-Modal Augmentation for Low-Resource Language Understanding and Generation (Li & Ke, MAGMaR 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.magmar-1.9.pdf