CARL: Constraint-Aware Reinforcement Learning for Planning with LLMs

Qiuyi Qi, Jinjian Zhang, Mutian Bao, Tian Liang, Guocong Li, Dongnan Liu, Wei Zhou, Jie Liu, Ming Kong, Linjian Mo, Feng Zhang, Qiang Zhu


Abstract
Despite their strong reasoning capabilities and extensive world knowledge, Large Language Models (LLMs) frequently generate plans that violate task constraints, undermining their reliability in real-world applications. This deficiency arises from a lack of systematic mechanisms to incorporate constraint information during the generation process. While existing approaches attempt to mitigate this by relying on external tools or task decomposition, they fail to enhance the model’s intrinsic constraint awareness. To address this, we propose Constraint-Aware Reinforcement Learning (CARL), a novel RL framework designed to strengthen LLMs’ intrinsic focus on constraints. CARL introduces a constraint-aware reward by comparing the model’s output distributions under constrained and unconstrained inputs, encouraging constraint focus and penalizing neglect.Compatible with various RL frameworks and requiring no external solvers or top models, CARL enables scalable, end-to-end constraint-aware planning. Extensive experiments on BlocksWorld, TravelPlanner, and T-Eval demonstrate that CARL significantly outperforms standard Reinforcement Fine-Tuning (RFT) baselines and state-of-the-art reasoning models, exhibiting a markedly increased focus on constraints.
Anthology ID:
2026.findings-acl.1069
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
21257–21281
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1069/
DOI:
Bibkey:
Cite (ACL):
Qiuyi Qi, Jinjian Zhang, Mutian Bao, Tian Liang, Guocong Li, Dongnan Liu, Wei Zhou, Jie Liu, Ming Kong, Linjian Mo, Feng Zhang, and Qiang Zhu. 2026. CARL: Constraint-Aware Reinforcement Learning for Planning with LLMs. In Findings of the Association for Computational Linguistics: ACL 2026, pages 21257–21281, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
CARL: Constraint-Aware Reinforcement Learning for Planning with LLMs (Qi et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1069.pdf
Checklist:
 2026.findings-acl.1069.checklist.pdf