Text Takes Over: A Study of Modality Bias in Multimodal Intent Detection

Ankan Mullick, Saransh Sharma, Abhik Jana, Pawan Goyal


Abstract
The rise of multimodal data, integrating text, audio, and visuals, has created new opportunities for studying multimodal tasks such as intent detection. This work investigates the effectiveness of Large Language Models (LLMs) and non-LLMs, including text-only and multimodal models, in the multimodal intent detection task. Our study reveals that Mistral-7B, a text-only LLM, outperforms most competitive multimodal models by approximately 9% on MIntRec-1 and 4% on MIntRec2.0 dataset. This performance advantage comes from a strong textual bias in these datasets, where over 90% of the samples require textual input, either alone or in combination with other modalities, for correct classification. We confirm the modality bias of these datasets via human evaluation, too. Next, we propose a framework to debias the datasets, and upon debiasing, more than 70% of the samples in MIntRec-1 and more than 50% in MIntRec2.0 get removed, resulting in significant performance degradation across all models, with smaller multimodal fusion models being the most affected with an accuracy drop of over 50 - 60%. Further, we analyze the context-specific relevance of different modalities through empirical analysis. Our findings highlight the challenges posed by modality bias in multimodal intent datasets and emphasize the need for unbiased datasets to evaluate multimodal models effectively. We release both the code and the dataset used for this work at https://github.com/Text-Takes-Over-EMNLP-2025/MultiModal-Intent-EMNLP-2025.
Anthology ID:
2025.emnlp-main.1226
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
24039–24069
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1226/
DOI:
Bibkey:
Cite (ACL):
Ankan Mullick, Saransh Sharma, Abhik Jana, and Pawan Goyal. 2025. Text Takes Over: A Study of Modality Bias in Multimodal Intent Detection. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 24039–24069, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Text Takes Over: A Study of Modality Bias in Multimodal Intent Detection (Mullick et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1226.pdf
Checklist:
 2025.emnlp-main.1226.checklist.pdf