Tag-Instruct: Controlled Instruction Complexity Enhancement through Structure-based Augmentation
He Zhu, Zhiwen Ruan, Junyou Su, Xingwei He, Yun Chen, Wenjia Zhang, Guanhua Chen
Abstract
High-quality instruction data is crucial for developing large language models (LLMs), yet existing approaches struggle to effectively control instruction complexity. We present Tag-Instruct, a novel framework that enhances instruction complexity through structured semantic compression and controlled difficulty augmentation. Unlike previous prompt-based methods operating on raw text, Tag-Instruct compresses instructions into a compact tag space and systematically enhances complexity through RL-guided tag expansion. Through extensive experiments, we show that Tag-Instruct outperforms existing instruction complexity augmentation approaches. Our analysis reveals that operating in tag space provides superior controllability and stability across different instruction synthesis frameworks.- Anthology ID:
- 2025.findings-acl.911
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2025
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 17708–17729
- Language:
- URL:
- https://preview.aclanthology.org/display_plenaries/2025.findings-acl.911/
- DOI:
- Cite (ACL):
- He Zhu, Zhiwen Ruan, Junyou Su, Xingwei He, Yun Chen, Wenjia Zhang, and Guanhua Chen. 2025. Tag-Instruct: Controlled Instruction Complexity Enhancement through Structure-based Augmentation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 17708–17729, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- Tag-Instruct: Controlled Instruction Complexity Enhancement through Structure-based Augmentation (Zhu et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/display_plenaries/2025.findings-acl.911.pdf