LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information

Bowen Ping; Jiali Zeng; Fandong Meng; Shuo Wang; Jie Zhou (周洁); Shanghang Zhang

LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information

Bowen Ping, Jiali Zeng, Fandong Meng, Shuo Wang, Jie Zhou, Shanghang Zhang

Abstract

Recent advancements in large language models (LLMs) have markedly improved their capacity to handle long text inputs; however, current models, including GPT-4o, still exhibit unsatisfactory performance in long-form generation. Generating high-quality long-form content still remains a significant challenge. In this paper, we present LongDPO, a novel approach designed to enhance long-form text generation through step-level supervision. By leveraging Monte Carlo Tree Search (MCTS) to collect stepwise preference pairs and employing a global memory pool to maintain factual accuracy, LongDPO effectively mitigates issues such as inconsistencies that are prevalent in long-context LLMs. Furthermore, we integrate critique-augmented generation to refine the selected preference pairs. Following the collection of stepwise preference pairs, we apply stepwise preference learning for fine-grained optimization. Experimental results demonstrate that our method enhances performance on long-form generation benchmarks (e.g. LongBench-Write) while maintaining nearly lossless performance on several general benchmarks.

Anthology ID:: 2025.findings-acl.395
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7613–7632
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.findings-acl.395/
DOI:
Bibkey:
Cite (ACL):: Bowen Ping, Jiali Zeng, Fandong Meng, Shuo Wang, Jie Zhou, and Shanghang Zhang. 2025. LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information. In Findings of the Association for Computational Linguistics: ACL 2025, pages 7613–7632, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information (Ping et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.findings-acl.395.pdf

PDF Cite Search Fix data