From Coarse to Fine: Benchmarking and Reward Modeling for Writing-Centric Generation Tasks

Qingyu Ren, Tianjun Pan, Xingzhou Chen, Xuhong Wang


Abstract
Large language models have achieved remarkable progress in text generation but still struggle with generative writing tasks. In terms of evaluation, existing evaluation benchmarks include few requirement types and writing reward models are not evaluated. In terms of training, existing studies often enhance writing ability through reinforcement learning with verifiable rewards (RLVR). Howerver, existing reward model training remains coarse-grained. To address these issues, we introduce W²Bench, a comprehensive evaluation benchmark, and WRL, a fine-grained training framework. W²Bench covers five task categories and seven requirement types, enabling systematic evaluation of both writing and writing reward models by measuring the correlation between reward rankings and golden rankings. WRL constructs positive and negative samples by dropping instruction requirements to construct positive and negative examples, allowing more precise reward model training. Experiments show that our models achieve substantial improvements on various writing benchmarks and exhibit strong generalization. We will release our code and data to support future research.
Anthology ID:
2026.findings-acl.134
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2796–2810
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.134/
DOI:
Bibkey:
Cite (ACL):
Qingyu Ren, Tianjun Pan, Xingzhou Chen, and Xuhong Wang. 2026. From Coarse to Fine: Benchmarking and Reward Modeling for Writing-Centric Generation Tasks. In Findings of the Association for Computational Linguistics: ACL 2026, pages 2796–2810, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
From Coarse to Fine: Benchmarking and Reward Modeling for Writing-Centric Generation Tasks (Ren et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.134.pdf
Checklist:
 2026.findings-acl.134.checklist.pdf