OMHBench: Benchmarking Balanced and Grounded Omni-Modal Multi-Hop Reasoning
Seunghee Kim, Ingyu Bang, Seokgyu Jang, Changhyeon Kim, Sanghwan Bae, Jihun Choi, Richeng Xuan, Taeuk Kim
Abstract
Multimodal Large Language Models (MLLMs) have increasingly supported omni-modal processing across text, vision, and speech. However, existing evaluation frameworks for such models suffer from critical limitations, including modality shortcuts and biased reasoning paths. To address these challenges, we propose OMHBench, a novel benchmark designed to rigorously evaluate omni-modal multi-hop reasoning. It consists of 6,144 questions with balanced reasoning paths that are jointly grounded across all three modalities. Extensive evaluation of 13 state-of-the-art models reveals that (1) a large performance gap exists between proprietary and open-source MLLMs and (2) even proprietary models exhibit high sensitivity to reasoning path variations, resulting in asymmetric omni-modal grounding. Notably, models struggle when processing the speech modality, underscoring the need for balanced, multi-hop evaluation of omni-modal intelligence.- Anthology ID:
- 2026.findings-acl.911
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 18311–18334
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.911/
- DOI:
- Cite (ACL):
- Seunghee Kim, Ingyu Bang, Seokgyu Jang, Changhyeon Kim, Sanghwan Bae, Jihun Choi, Richeng Xuan, and Taeuk Kim. 2026. OMHBench: Benchmarking Balanced and Grounded Omni-Modal Multi-Hop Reasoning. In Findings of the Association for Computational Linguistics: ACL 2026, pages 18311–18334, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- OMHBench: Benchmarking Balanced and Grounded Omni-Modal Multi-Hop Reasoning (Kim et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.911.pdf