Token-level Preference Self-Alignment Optimization for Multi-style Outline Controllable Generation

Zihao Li, Xuekong Xu, Ziyao Chen, Lixin Zou, Ethanhjwu Ethanhjwu, Qiang Chen, Chenliang Li


Abstract
Multi-style outline controllable generation is crucial for multiple applications, including document semantic structuring and retrieval-augmented generation.The great success of preference alignment approaches encourages their application in controllable generation tasks.However, these attempts encounter several limitations: (1) response pair requirements, (2) substantial computation costs, and (3) insufficient exploitation of fine-grained preference signals.To address these problems, we propose a token-level preference self-alignment optimization, named TKPO, for outline controllable generation. TKPO extends the Bradley-Terry model from pair-wise to list-wise comparison, which is further applied at the token level for fine-grained preference signal utilization. In comparison to the representative methods, e.g., DPO, TKPO does not require response pairs; instead, we propose a controllable attributes-driven method to construct reject samples for self-alignment. Additionally, TKPO optimizes only the base model, thereby avoiding additional memory usage and substantial computational costs.We curate two outline controllable generation datasets with regard to language style and level-of-detail.Extensive experiments demonstrate that TKPO outperforms DPO by up to 19.28% in performance while requiring only 56.25% in training time.We release the code and datasets resources at https://github.com/WHUIR/TKPO.
Anthology ID:
2025.findings-acl.823
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venues:
Findings | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15974–16007
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.823/
DOI:
Bibkey:
Cite (ACL):
Zihao Li, Xuekong Xu, Ziyao Chen, Lixin Zou, Ethanhjwu Ethanhjwu, Qiang Chen, and Chenliang Li. 2025. Token-level Preference Self-Alignment Optimization for Multi-style Outline Controllable Generation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 15974–16007, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Token-level Preference Self-Alignment Optimization for Multi-style Outline Controllable Generation (Li et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.823.pdf