A Survey on Multi-modal Intent Recognition: Recent Advances and New Frontiers
Zhihong Zhu, Fan Zhang, Yunyan Zhang, Jinghan Sun, Zhiqi Huang, Qingqing Long, Bowen Xing, Xian Wu
Abstract
Multi-modal intent recognition (MIR) requires integrating non-verbal cues from real-world contexts to enhance human intention understanding, which has attracted substantial research attention in recent years. Despite promising advancements, a comprehensive survey summarizing recent advances and new frontiers remains absent. To this end, we present a thorough and unified review of MIR, covering different aspects including (1) Extensive survey: we take the first step to present a thorough survey of this research field covering textual, visual (image/video), and acoustic signals. (2) Unified taxonomy: we provide a unified framework including evaluation protocol and advanced methods to summarize the current progress in MIR. (3) Emerging frontiers: We discuss some future directions such as multi-task, multi-domain, and multi-lingual MIR, and give our thoughts respectively. (4) Abundant resources: we collect abundant open-source resources, including relevant papers, data corpora, and leaderboards. We hope this survey can shed light on future research in MIR.- Anthology ID:
- 2025.findings-emnlp.823
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2025
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 15223–15236
- Language:
- URL:
- https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.823/
- DOI:
- 10.18653/v1/2025.findings-emnlp.823
- Cite (ACL):
- Zhihong Zhu, Fan Zhang, Yunyan Zhang, Jinghan Sun, Zhiqi Huang, Qingqing Long, Bowen Xing, and Xian Wu. 2025. A Survey on Multi-modal Intent Recognition: Recent Advances and New Frontiers. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 15223–15236, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- A Survey on Multi-modal Intent Recognition: Recent Advances and New Frontiers (Zhu et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.823.pdf