IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization

Xinghua Zhang (张兴华); Haiyang Yu; Cheng Fu; Fei Huang; Yongbin Li

IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization

Xinghua Zhang, Haiyang Yu, Cheng Fu, Fei Huang, Yongbin Li

Abstract

In the realm of large language models (LLMs), the ability of models to accurately follow instructions is paramount as more agents and applications leverage LLMs for construction, where the complexity of instructions are rapidly increasing. However, on the one hand, there is only a certain amount of complex instruction evaluation data; on the other hand, there are no dedicated algorithms to improve the ability to follow complex instructions. To this end, this paper introduces Trace, a benchmark for improving and evaluating the complex instruction-following ability, which consists of 120K training data and 1K evaluation data. Furthermore, we propose IOPO (Input-Output Preference Optimization) alignment method which takes both input and output preference pairs into consideration, where LLMs not only rapidly align with response preferences but also meticulously explore the instruction preferences. Extensive experiments on both in-domain and out-of-domain datasets confirm the effectiveness of IOPO, showing 8.15%, 2.18% improvements on in-domain data and 5.91%, 2.83% on out-of-domain data compared to SFT and DPO respectively. Our code and dataset are released at https://anonymous.4open.science/r/Code7-34A5.

Anthology ID:: 2025.acl-long.1079
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 22185–22200
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1079/
DOI:
Bibkey:
Cite (ACL):: Xinghua Zhang, Haiyang Yu, Cheng Fu, Fei Huang, and Yongbin Li. 2025. IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 22185–22200, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization (Zhang et al., ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1079.pdf

PDF Cite Search Fix data