OMHBench: Benchmarking Balanced and Grounded Omni-Modal Multi-Hop Reasoning

Seunghee Kim; Ingyu Bang; Seokgyu Jang; Changhyeon Kim; Sanghwan Bae; Jihun Choi; Richeng Xuan; Taeuk Kim

OMHBench: Benchmarking Balanced and Grounded Omni-Modal Multi-Hop Reasoning

Seunghee Kim, Ingyu Bang, Seokgyu Jang, Changhyeon Kim, Sanghwan Bae, Jihun Choi, Richeng Xuan, Taeuk Kim

Abstract

Multimodal Large Language Models (MLLMs) have increasingly supported omni-modal processing across text, vision, and speech. However, existing evaluation frameworks for such models suffer from critical limitations, including modality shortcuts and biased reasoning paths. To address these challenges, we propose OMHBench, a novel benchmark designed to rigorously evaluate omni-modal multi-hop reasoning. It consists of 6,144 questions with balanced reasoning paths that are jointly grounded across all three modalities. Extensive evaluation of 13 state-of-the-art models reveals that (1) a large performance gap exists between proprietary and open-source MLLMs and (2) even proprietary models exhibit high sensitivity to reasoning path variations, resulting in asymmetric omni-modal grounding. Notably, models struggle when processing the speech modality, underscoring the need for balanced, multi-hop evaluation of omni-modal intelligence.

Anthology ID:: 2026.findings-acl.911
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 18311–18334
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.911/
DOI:
Bibkey:
Cite (ACL):: Seunghee Kim, Ingyu Bang, Seokgyu Jang, Changhyeon Kim, Sanghwan Bae, Jihun Choi, Richeng Xuan, and Taeuk Kim. 2026. OMHBench: Benchmarking Balanced and Grounded Omni-Modal Multi-Hop Reasoning. In Findings of the Association for Computational Linguistics: ACL 2026, pages 18311–18334, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: OMHBench: Benchmarking Balanced and Grounded Omni-Modal Multi-Hop Reasoning (Kim et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.911.pdf
Checklist:: 2026.findings-acl.911.checklist.pdf

PDF Cite Search Checklist Fix data