Token-level Preference Self-Alignment Optimization for Multi-style Outline Controllable Generation
Zihao Li, Xuekong Xu, Ziyao Chen, Lixin Zou, Ethanhjwu Ethanhjwu, Qiang Chen, Chenliang Li
Abstract
Multi-style outline controllable generation is crucial for multiple applications, including document semantic structuring and retrieval-augmented generation.The great success of preference alignment approaches encourages their application in controllable generation tasks.However, these attempts encounter several limitations: (1) response pair requirements, (2) substantial computation costs, and (3) insufficient exploitation of fine-grained preference signals.To address these problems, we propose a token-level preference self-alignment optimization, named TKPO, for outline controllable generation. TKPO extends the Bradley-Terry model from pair-wise to list-wise comparison, which is further applied at the token level for fine-grained preference signal utilization. In comparison to the representative methods, e.g., DPO, TKPO does not require response pairs; instead, we propose a controllable attributes-driven method to construct reject samples for self-alignment. Additionally, TKPO optimizes only the base model, thereby avoiding additional memory usage and substantial computational costs.We curate two outline controllable generation datasets with regard to language style and level-of-detail.Extensive experiments demonstrate that TKPO outperforms DPO by up to 19.28% in performance while requiring only 56.25% in training time.We release the code and datasets resources at https://github.com/WHUIR/TKPO.- Anthology ID:
- 2025.findings-acl.823
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2025
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
- Venues:
- Findings | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 15974–16007
- Language:
- URL:
- https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.823/
- DOI:
- Cite (ACL):
- Zihao Li, Xuekong Xu, Ziyao Chen, Lixin Zou, Ethanhjwu Ethanhjwu, Qiang Chen, and Chenliang Li. 2025. Token-level Preference Self-Alignment Optimization for Multi-style Outline Controllable Generation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 15974–16007, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- Token-level Preference Self-Alignment Optimization for Multi-style Outline Controllable Generation (Li et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.823.pdf