Tianjun Pan

2026

From Coarse to Fine: Benchmarking and Reward Modeling for Writing-Centric Generation Tasks
Qingyu Ren | Tianjun Pan | Xingzhou Chen | Xuhong Wang
Findings of the Association for Computational Linguistics: ACL 2026

Large language models have achieved remarkable progress in text generation but still struggle with generative writing tasks. In terms of evaluation, existing evaluation benchmarks include few requirement types and writing reward models are not evaluated. In terms of training, existing studies often enhance writing ability through reinforcement learning with verifiable rewards (RLVR). Howerver, existing reward model training remains coarse-grained. To address these issues, we introduce W²Bench, a comprehensive evaluation benchmark, and WRL, a fine-grained training framework. W²Bench covers five task categories and seven requirement types, enabling systematic evaluation of both writing and writing reward models by measuring the correlation between reward rankings and golden rankings. WRL constructs positive and negative samples by dropping instruction requirements to construct positive and negative examples, allowing more precise reward model training. Experiments show that our models achieve substantial improvements on various writing benchmarks and exhibit strong generalization. We will release our code and data to support future research.

Co-authors

Venues

Findings1

Fix author