InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models with Human Feedback

Henry Hengyuan Zhao; Wenqi Pei; Yifei Tao; Haiyang Mei; Mike Zheng Shou

doi:10.18653/v1/2025.findings-emnlp.1383

InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models with Human Feedback

Henry Hengyuan Zhao, Wenqi Pei, Yifei Tao, Haiyang Mei, Mike Zheng Shou

Abstract

Existing benchmarks do not test Large Multimodal Models (LMMs) on their interactive intelligence with human users which is vital for developing general-purpose AI assistants. We design InterFeedback, an interactive framework, which can be applied to any LMM and dataset to assess this ability autonomously. On top of this, we introduce InterFeedback-Bench that evaluates interactive intelligence using two representative datasets, MMMU-Pro and MathVerse, to test 10 different open-source LMMs. Additionally, we present InterFeedback-Human, a newly collected dataset of 120 cases designed for manually testing interactive performance in leading models such as OpenAI-o1 and Claude-3.5-Sonnet. Our evaluation results show that state-of-the-art LMM (e.g., OpenAI-o1) can correct their results through human feedback less than 50%. Our findings point to the need for methods that can enhance LMMs’ capabilities to interpret and benefit from feedback.

Anthology ID:: 2025.findings-emnlp.1383
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 25381–25400
Language:
URL:: https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.1383/
DOI:: 10.18653/v1/2025.findings-emnlp.1383
Bibkey:
Cite (ACL):: Henry Hengyuan Zhao, Wenqi Pei, Yifei Tao, Haiyang Mei, and Mike Zheng Shou. 2025. InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models with Human Feedback. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 25381–25400, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models with Human Feedback (Zhao et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.1383.pdf
Checklist:: 2025.findings-emnlp.1383.checklist.pdf

PDF Cite Search Checklist Fix data