CARMO: Dynamic Criteria Generation for Context Aware Reward Modelling

Taneesh Gupta, Shivam Shandilya, Xuchao Zhang, Rahul Madhavan, Supriyo Ghosh, Chetan Bansal, Huaxiu Yao, Saravan Rajmohan


Abstract
Reward modeling in large language models is known to be susceptible to reward hacking, causing models to latch onto superficial features such as the tendency to generate lists or unnecessarily long responses. In RLHF, and more generally during post-training, flawed reward signals often lead to outputs that optimize for these spurious correlates instead of genuine quality or correctness. We propose **Carmo (Context-Aware Reward Modeling)**, a novel approach that first generates dynamic, context-relevant criteria to ground the reward model prior to producing reward scores. Unlike prior methods that use static rubrics, Carmo leverages powerful LLMs to adaptively create evaluation criteria, e.g., logical consistency, clarity, and depth, tailored to the user query. Our theoretical analysis shows that such criteria generation can mitigate reward hacking. We further demonstrate how Carmo can be distilled into smaller models, thereby lowering the computational cost of alignment. We establish a new state-of-the-art performance on zero shot setting for generative models, with a 2.1% improvement on Reward Bench. Furthermore, alignment performed on the Carmo-curated preference dataset achieves **22.5% and 21.1% LC-WR (%) and WR (%) on Mistral-Base (7B)**. We release our datasets at [huggingface/CARMO](https://huggingface.co/datasets/Multi-preference-Optimization/CARMO-UltraFeedback).
Anthology ID:
2025.findings-acl.114
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venues:
Findings | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2202–2261
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.findings-acl.114/
DOI:
Bibkey:
Cite (ACL):
Taneesh Gupta, Shivam Shandilya, Xuchao Zhang, Rahul Madhavan, Supriyo Ghosh, Chetan Bansal, Huaxiu Yao, and Saravan Rajmohan. 2025. CARMO: Dynamic Criteria Generation for Context Aware Reward Modelling. In Findings of the Association for Computational Linguistics: ACL 2025, pages 2202–2261, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
CARMO: Dynamic Criteria Generation for Context Aware Reward Modelling (Gupta et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.findings-acl.114.pdf