AutoRubric: Rubric-Based Generative Rewards for Faithful Multimodal Reasoning

Mengzhao Jia; Zhihan Zhang; Ignacio Cases; Zheyuan Liu; Meng Jiang; Peng Qi

AutoRubric: Rubric-Based Generative Rewards for Faithful Multimodal Reasoning

Mengzhao Jia, Zhihan Zhang, Ignacio Cases, Zheyuan Liu, Meng Jiang, Peng Qi

Abstract

Multimodal large language models (MLLMs) have rapidly advanced from perception tasks to complex multi-step reasoning, yet reinforcement learning with verifiable rewards (RLVR) often leads to spurious reasoning since only the final-answer correctness is rewarded. To address this limitation, we propose AutoRubric, a framework that integrates RLVR with process-level supervision through automatically collected rubric-based generative rewards. Our key innovation lies in a scalable self-aggregation method that distills consistent reasoning checkpoints from successful trajectories, enabling problem-specific rubric construction without human annotation or stronger teacher models. By jointly leveraging rubric-based and outcome rewards, AutoRubric-R1V achieves state-of-the-art performance on six multimodal reasoning benchmarks and substantially improves reasoning faithfulness in dedicated evaluations.

Anthology ID:: 2026.findings-acl.1282
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 25707–25724
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1282/
DOI:
Bibkey:
Cite (ACL):: Mengzhao Jia, Zhihan Zhang, Ignacio Cases, Zheyuan Liu, Meng Jiang, and Peng Qi. 2026. AutoRubric: Rubric-Based Generative Rewards for Faithful Multimodal Reasoning. In Findings of the Association for Computational Linguistics: ACL 2026, pages 25707–25724, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: AutoRubric: Rubric-Based Generative Rewards for Faithful Multimodal Reasoning (Jia et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1282.pdf
Checklist:: 2026.findings-acl.1282.checklist.pdf

PDF Cite Search Checklist Fix data