Haijun Wu


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
Token-level Preference Self-Alignment Optimization for Multi-style Outline Controllable Generation
Zihao Li | Xuekong Xu | Ziyao Chen | Lixin Zou | Haijun Wu | Qiang Chen | Chenliang Li
Findings of the Association for Computational Linguistics: ACL 2025

Multi-style outline controllable generation is crucial for multiple applications, including document semantic structuring and retrieval-augmented generation.The great success of preference alignment approaches encourages their application in controllable generation tasks.However, these attempts encounter several limitations: (1) response pair requirements, (2) substantial computation costs, and (3) insufficient exploitation of fine-grained preference signals.To address these problems, we propose a token-level preference self-alignment optimization, named TKPO, for outline controllable generation. TKPO extends the Bradley-Terry model from pair-wise to list-wise comparison, which is further applied at the token level for fine-grained preference signal utilization. In comparison to the representative methods, e.g., DPO, TKPO does not require response pairs; instead, we propose a controllable attributes-driven method to construct reject samples for self-alignment. Additionally, TKPO optimizes only the base model, thereby avoiding additional memory usage and substantial computational costs.We curate two outline controllable generation datasets with regard to language style and level-of-detail.Extensive experiments demonstrate that TKPO outperforms DPO by up to 19.28% in performance while requiring only 56.25% in training time.We release the code and datasets resources at https://github.com/WHUIR/TKPO.