MagicBench: Diagnosing Visual Agency Loss and Semantic Dependency in Multimodal LLMs

Tang Da Huang, Weidong Tang, Wen Qi Xu, Xianpeng Guo


Abstract
Multimodal Large Language Models typically assume linguistic context invariably enhances visual understanding. We study this assumption in semantic adversarial scenarios, specifically magic tricks, where narration deliberately diverges from physical reality. We introduce MagicBench, a diagnostic benchmark of 402 videos for evaluating MLLMs under hierarchical linguistic interference, together with a Physical Constraint Set (PCS) protocol for assessing adherence to physical laws. Evaluation uncovers a Semantic Dependency Paradox: (1) Semantic anchoring: Entity nouns act as anchors aiding localization, paradoxically boosting performance despite false predicates. (2) Visual Agency Loss: In semantic vacuums, multimodal performance collapses 12.4% (p < 0.01) below the vision-only "capability probe". This gap persists under symmetric prompting, suggesting a form of functional perception suppression in which autonomous visual search may be under-utilized in multimodal settings without linguistic triggers. Causal interventions via spatial prompting and signal magnification provide evidence that internal reasoning remains functional, supporting the interpretation of a perceptual access bottleneck. Our findings suggest MLLMs function as "language-guided passive observers", advocating for perceptually-independent architectures that decouple sensory agency from linguistic dominance. Code and dataset are available at https://github.com/Ink-Dawn/MagicBench
Anthology ID:
2026.acl-long.1314
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
28493–28511
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1314/
DOI:
Bibkey:
Cite (ACL):
Tang Da Huang, Weidong Tang, Wen Qi Xu, and Xianpeng Guo. 2026. MagicBench: Diagnosing Visual Agency Loss and Semantic Dependency in Multimodal LLMs. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 28493–28511, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
MagicBench: Diagnosing Visual Agency Loss and Semantic Dependency in Multimodal LLMs (Da Huang et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1314.pdf
Checklist:
 2026.acl-long.1314.checklist.pdf