Auto-Weighted Group Relative Preference Optimization for Multi-Objective Text Generation Tasks

Yuki Ichihara; Yuu Jinnai

Auto-Weighted Group Relative Preference Optimization for Multi-Objective Text Generation Tasks

Abstract

Group Relative Policy Optimization (GRPO) is a promising approach to complex, real-world tasks, such as those involving multiple rewards or strict constraints. However, when training GRPO with multiple rewards, the weights of each reward must be decided in advance. Failing to balance the objectives adequately can lead to overfitting or insufficient learning of each reward function. To address this problem, we propose Auto-Weighted Group Relative Policy Optimization (AW-GRPO), which adjusts reward weights during training according to the progress of the learning of each objective so far.We evaluate AW-GRPO on advertising text generation, a real-world problem where the generated text must satisfy multiple objectives, such as quality and diversity, while adhering to the constraints of the media (e.g., maximum number of characters).Our results show that AW-GRPO successfully balances multiple objectives, improving the overall scores while reducing the constraint violation rate.We additionally evaluate AW-GRPO using publicly available benchmark problems for reproducibility, in which we observe the same qualitative result that the proposed method outperforms GRPO.

Anthology ID:: 2025.emnlp-industry.80
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:: November
Year:: 2025
Address:: Suzhou (China)
Editors:: Saloni Potdar, Lina Rojas-Barahona, Sebastien Montella
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1134–1147
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.80/
DOI:
Bibkey:
Cite (ACL):: Yuki Ichihara and Yuu Jinnai. 2025. Auto-Weighted Group Relative Preference Optimization for Multi-Objective Text Generation Tasks. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 1134–1147, Suzhou (China). Association for Computational Linguistics.
Cite (Informal):: Auto-Weighted Group Relative Preference Optimization for Multi-Objective Text Generation Tasks (Ichihara & Jinnai, EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.80.pdf

PDF Cite Search Fix data