Yingxu He
2025
MERaLiON-AudioLLM: Advancing Speech and Language Understanding for Singapore
Yingxu He
|
Zhuohan Liu
|
Geyu Lin
|
Shuo Sun
|
Bin Wang
|
Wenyu Zhang
|
Xunlong Zou
|
Nancy F. Chen
|
AiTi Aw
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
We introduce MERaLiON-AudioLLM, the first general-purpose audio-based large language model designed for multitask learning, with a particular focus on Singlish understanding. Trained on 62 million multimodal instruction samples comprising a total of 260k hours of audio, it exhibits strong generalization across a diverse set of tasks, including—but not limited to—automatic speech recognition, spoken question answering, speech translation, and paralinguistic analysis. Our results show significant improvements in local speech recognition and task-specific understanding, making MERaLiON-AudioLLM a leading solution for region-specific AI applications. An interactive demo has been developed to enable user-friendly interactions, supported by a backend with customized caching and load-balancing mechanisms. We benchmark the model across a broad range of multilingual and multitask scenarios, where it demonstrates competitive performance compared to other open-source models. The demo page, model weights and videos are publically accessible.
Search
Fix author
Co-authors
- Aiti Aw 1
- Nancy Chen 1
- Geyu Lin 1
- Zhuohan Liu 1
- Shuo Sun 1
- show all...
Venues
- acl1