rSIM: Incentivizing Reasoning Capabilities of LLMs via Reinforced Strategy Injection

Sijia Chen, Baochun Li, Di Niu


Abstract
Large language models (LLMs) are post-trained through reinforcement learning (RL) to evolve into Reasoning Language Models (RLMs), where the hallmark of this advanced reasoning is “aha” moments when they start to perform strategies, such as self-reflection and deep thinking, within chain of thoughts (CoTs). Motivated by this, this paper proposes a novel reinforced strategy injection mechanism (rSIM), that enables any LLM to become an RLM by employing a small planner to guide the LLM’s CoT through the adaptive injection of reasoning strategies. To achieve this, the planner (leader agent) is jointly trained with an LLM (follower agent) using multi-agent RL (MARL), based on a leader-follower framework and straightforward rule-based rewards. Experimental results show that rSIM enables Qwen2.5-0.5B to become an RLM and significantly outperform Qwen2.5-14B across mathematical, coding, and financial reasoning tasks. Moreover, the planner is generalizable: it only needs to be trained once and can be applied as a plug-in to substantially improve the reasoning capabilities of existing LLMs. In addition, the planner supports continual learning across various tasks, allowing its planning abilities to gradually improve and generalize to a wider range of problems. Our source code is available under the examples/rSIM of https://github.com/AgenticFinLab/eparl.
Anthology ID:
2026.acl-long.2054
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
44389–44405
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.2054/
DOI:
Bibkey:
Cite (ACL):
Sijia Chen, Baochun Li, and Di Niu. 2026. rSIM: Incentivizing Reasoning Capabilities of LLMs via Reinforced Strategy Injection. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 44389–44405, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
rSIM: Incentivizing Reasoning Capabilities of LLMs via Reinforced Strategy Injection (Chen et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.2054.pdf
Checklist:
 2026.acl-long.2054.checklist.pdf