Instructions are all you need: Self-supervised Reinforcement Learning for Instruction Following

Qingyu Ren; Qianyu He; Powei Chang; Jie Zeng; Zeye Sun; Fei Yu; Jiaqing Liang; Yanghua Xiao

Instructions are all you need: Self-supervised Reinforcement Learning for Instruction Following

Qingyu Ren, Qianyu He, Powei Chang, Jie Zeng, Zeye Sun, Fei Yu, Jiaqing Liang, Yanghua Xiao

Abstract

Language models often struggle to follow multi-constraint instructions that are crucial for real-world applications. Existing reinforcement learning (RL) approaches suffer from dependency on external supervision and sparse reward signals from multi-constraint tasks. We propose a label-free self-supervised RL framework that eliminates dependency on external supervision by deriving reward signals directly from instructions and generating pseudo-labels for reward model training. Our approach introduces constraint decomposition strategies and efficient constraint-wise binary classification to address sparse reward challenges while maintaining computational efficiency. Experiments show that our approach generalizes well, achieving strong improvements across 3 in-domain and 5 out-of-domain datasets, including challenging agentic and multi-turn instruction following. We will open-source our code and data to facilitate future research.

Anthology ID:: 2026.acl-long.217
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4755–4776
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.217/
DOI:
Bibkey:
Cite (ACL):: Qingyu Ren, Qianyu He, Powei Chang, Jie Zeng, Zeye Sun, Fei Yu, Jiaqing Liang, and Yanghua Xiao. 2026. Instructions are all you need: Self-supervised Reinforcement Learning for Instruction Following. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4755–4776, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Instructions are all you need: Self-supervised Reinforcement Learning for Instruction Following (Ren et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.217.pdf
Checklist:: 2026.acl-long.217.checklist.pdf

PDF Cite Search Checklist Fix data