Subhadip Baidya
2025
RELIC: Enhancing Reward Model Generalization for Low-Resource Indic Languages with Few-Shot Examples
Soumya Suvra Ghosal
|
Vaibhav Singh
|
Akash Ghosh
|
Soumyabrata Pal
|
Subhadip Baidya
|
Sriparna Saha
|
Dinesh Manocha
Findings of the Association for Computational Linguistics: EMNLP 2025
Reward models are essential for aligning large language models (LLMs) with human preferences. However, most open-source multilingual reward models are primarily trained on preference datasets in high-resource languages, resulting in unreliable reward signals for low-resource Indic languages. Collecting large-scale, high-quality preference data for these languages is prohibitively expensive, making preference-based training approaches impractical. To address this challenge, we propose RELIC, a novel in-context learning framework for reward modeling in low-resource Indic languages. RELIC trains a retriever with a pairwise ranking objective to select in-context examples from auxiliary high-resource languages that most effectively highlight the distinction between preferred and less-preferred responses. Extensive experiments on three preference datasets—PKU-SafeRLHF, WebGPT, and HH-RLHF—using state-of-the-art open-source reward models demonstrate that RELIC significantly improves reward model accuracy for low-resource Indic languages, consistently outperforming existing example selection methods. For example, on Bodo—a low-resource Indic language—using a LLaMA-3.2-3B reward model, RELIC achieves a 12.81% and 10.13% improvement in accuracy over zero-shot prompting and state-of-the-art example selection method, respectively
Search
Fix author
Co-authors
- Soumya Suvra Ghosal 1
- Akash Ghosh 1
- Dinesh Manocha 1
- Soumyabrata Pal 1
- Sriparna Saha 1
- show all...