Beyond Ranking: Fine-Grained Diagnostics and Self-Improvement for MLLMs

Mingze Xu; Zijing Zhao; Qiming Peng; Houwen Peng; Han Hu; Zhanhui Kang; Yuxing Han

Beyond Ranking: Fine-Grained Diagnostics and Self-Improvement for MLLMs

Mingze Xu, Zijing Zhao, Qiming Peng, Houwen Peng, Han Hu, Zhanhui Kang, Yuxing Han

Abstract

While Multimodal Large Language Models (MLLMs) are advancing rapidly, accurately evaluating their capabilities remains challenging. Current paradigms primarily rely on holistic scoring and static leaderboards, which fail to disentangle fine-grained competencies. Specifically, they suffer from “Outcome Bias” by validating only final answers and ignoring intermediate reasoning. To address these limitations, we introduce ATOM (AnaTomy Of MLLM), a novel MLLM-as-a-judge framework designed to shift the focus from ranking to fine-grained diagnosis. ATOM decomposes complex reasoning into atomic criteria anchored in visual elements, enforcing verification against explicit visual facts. Validated on a newly constructed benchmark with rigorous human rankings, ATOM achieves state-of-the-art accuracy, surpassing the strongest baseline by up to 7.92%. Moving beyond ranking, ATOM bridges the gap between assessment and alignment: by pinpointing atomic-level failures, it establishes a closed-loop mechanism for targeted self-correction. This approach enables models to identify and rectify errors autonomously, successfully resolving up to 39.95% of previously failed queries without human intervention.

Anthology ID:: 2026.acl-long.932
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 20347–20377
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.932/
DOI:
Bibkey:
Cite (ACL):: Mingze Xu, Zijing Zhao, Qiming Peng, Houwen Peng, Han Hu, Zhanhui Kang, and Yuxing Han. 2026. Beyond Ranking: Fine-Grained Diagnostics and Self-Improvement for MLLMs. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 20347–20377, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Beyond Ranking: Fine-Grained Diagnostics and Self-Improvement for MLLMs (Xu et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.932.pdf
Checklist:: 2026.acl-long.932.checklist.pdf

PDF Cite Search Checklist Fix data