Agentic-R1: Distilled Dual-Strategy Reasoning

Weihua Du; Pranjal Aggarwal; Sean Welleck; Yiming Yang (杨亦鸣)

Agentic-R1: Distilled Dual-Strategy Reasoning

Weihua Du, Pranjal Aggarwal, Sean Welleck, Yiming Yang

Abstract

Current long chain-of-thought (long-CoT) models excel at mathematical reasoning but rely on slow and error-prone natural language traces. Tool-augmented agents address arithmetic via code execution, but often falter on complex logical tasks. We introduce a fine-tuning framework, **DualDistill**, that distills complementary reasoning strategies from multiple teachers into a unified student model. Using this approach, we train **Agentic-R1**, which dynamically selects the optimal strategy for each query, invoking tools for arithmetic and algorithmic problems and using text-based reasoning for abstract ones. Our method improves accuracy on computation-intensive tasks and reduces inference latency on standard benchmarks, demonstrating the promise of multi-strategy distillation for robust and efficient reasoning.

Anthology ID:: 2025.emnlp-main.604
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12040–12054
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.604/
DOI:
Bibkey:
Cite (ACL):: Weihua Du, Pranjal Aggarwal, Sean Welleck, and Yiming Yang. 2025. Agentic-R1: Distilled Dual-Strategy Reasoning. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 12040–12054, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Agentic-R1: Distilled Dual-Strategy Reasoning (Du et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.604.pdf
Checklist:: 2025.emnlp-main.604.checklist.pdf

PDF Cite Search Checklist Fix data