From Mimesis to Metamorphosis: Evolving VLM Judges via In-Context Comparing and Knowledge Internalization

Juntuo Wang, Yuming Qiao, Yifan Yang, Lunxi Yuan, Liang Luo, Dan Meng


Abstract
Vision-language models (VLMs) are increasingly adopted as judges for subjective assessment, yet absolute scoring remains brittle due to inconsistent scales and inherent preference biases. To bridge this gap, we propose S2AD (**Semantic-Anchored Scale-Agnostic Distillation**), a novel easy-to-hard framework that operationalizes subjective assessment as comparative analysis, conceptualizing the judge’s evolution from mimesis to metamorphosis. In Stage 1 (Mimesis), we introduce Dynamic Soft Positioning (DSP) to train the judge to compare a query against retrieved reference images, establishing a relative evaluation space that ensures consistent ordering under heterogeneous scales. In Stage 2 (Metamorphosis), this comparative capability is internalized via Language Buttons—discrete semantic levels serving as a retrieval-free internal reference. Optimized with Group Relative Policy Optimization (GRPO), S2AD achieves efficient, scale-steerable inference that adapts to diverse grading standards. Our framework reaches state-of-the-art performance across multiple benchmarks, validating the effectiveness of internalized comparative priors for robust, rank-invariant, and scale-steerable evaluation. The code is available at: https://github.com/SpatialVision-Research/SSAD_ACL2026_Findings.
Anthology ID:
2026.findings-acl.631
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12957–12971
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.631/
DOI:
Bibkey:
Cite (ACL):
Juntuo Wang, Yuming Qiao, Yifan Yang, Lunxi Yuan, Liang Luo, and Dan Meng. 2026. From Mimesis to Metamorphosis: Evolving VLM Judges via In-Context Comparing and Knowledge Internalization. In Findings of the Association for Computational Linguistics: ACL 2026, pages 12957–12971, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
From Mimesis to Metamorphosis: Evolving VLM Judges via In-Context Comparing and Knowledge Internalization (Wang et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.631.pdf
Checklist:
 2026.findings-acl.631.checklist.pdf