Agentic-R1: Distilled Dual-Strategy Reasoning

Weihua Du, Pranjal Aggarwal, Sean Welleck, Yiming Yang


Abstract
Current long chain-of-thought (long-CoT) models excel at mathematical reasoning but rely on slow and error-prone natural language traces. Tool-augmented agents address arithmetic via code execution, but often falter on complex logical tasks. We introduce a fine-tuning framework, **DualDistill**, that distills complementary reasoning strategies from multiple teachers into a unified student model. Using this approach, we train **Agentic-R1**, which dynamically selects the optimal strategy for each query, invoking tools for arithmetic and algorithmic problems and using text-based reasoning for abstract ones. Our method improves accuracy on computation-intensive tasks and reduces inference latency on standard benchmarks, demonstrating the promise of multi-strategy distillation for robust and efficient reasoning.
Anthology ID:
2025.emnlp-main.604
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12040–12054
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.604/
DOI:
Bibkey:
Cite (ACL):
Weihua Du, Pranjal Aggarwal, Sean Welleck, and Yiming Yang. 2025. Agentic-R1: Distilled Dual-Strategy Reasoning. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 12040–12054, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Agentic-R1: Distilled Dual-Strategy Reasoning (Du et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.604.pdf
Checklist:
 2025.emnlp-main.604.checklist.pdf