CoT-Edit: Reinforcement Learning of Chain-of-Thought Reasoning for Code Edit Suggestion

Wuya Chen, Yihao Yang, Yue Lin


Abstract
Code edit suggestion, which encompasses modifying, refactoring, and maintaining existing code, represents the most frequent software development activity and has become a focal point for AI-powered tools. Traditional methods translate explicit natural language instructions into code edits, while pattern-based approaches learn from users’ historical editing patterns to provide style-consistent and more accurate suggestions. However, these pattern-based methods still face two critical challenges: (1) difficulty handling edits that demand deep contextual reasoning, and (2) lack of interpretability in editing decisions. To tackle this, we propose CoT-Edit, a reinforcement learning framework that guides LLMs to discover chain-of-thought (CoT) reasoning paths for code editing without requiring human-annotated CoT data. Specifically, we design multi-step reasoning framework that enable: (1) analysis-guided code editing, and (2) seamless switching between CoT and non-CoT inference modes. Building on this, we introduce Edit-Aware Reward Modeling (EARM), a fine-grained diff-based reward approach for effective learning. Furthermore, we discover a LoRA merging strategy that enhances model generalization. Evaluations on an industrial dataset show that our approach achieves 60.2% edit accuracy, outperforming all strong baselines. Online A/B tests further confirm its effectiveness in production. Code is available at https://github.com/202230483077yyh/CoT-Edit.
Anthology ID:
2026.findings-acl.1407
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
28219–28234
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1407/
DOI:
Bibkey:
Cite (ACL):
Wuya Chen, Yihao Yang, and Yue Lin. 2026. CoT-Edit: Reinforcement Learning of Chain-of-Thought Reasoning for Code Edit Suggestion. In Findings of the Association for Computational Linguistics: ACL 2026, pages 28219–28234, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
CoT-Edit: Reinforcement Learning of Chain-of-Thought Reasoning for Code Edit Suggestion (Chen et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1407.pdf
Checklist:
 2026.findings-acl.1407.checklist.pdf