Guokai Tang


2026

Large language models (LLMs) show strong reasoning and decision-making ability, but their high inference cost motivates transferring agentic skills to small language models (SLMs). Agent distillation trains SLMs on full reason–act–observe trajectories from a tool-using teacher, enabling SLMs to acquire the tool-use capabilities of large teacher models. However, some teacher-agent trajectories are simply hard for the student to learn, and their compatibility with the student can vary widely; moreover, a uniform token-level loss prevents SLMs from learning the tool-use patterns and final decisions that truly drive successful reasoning. Therefore, we propose SmartAD, a capacity-aligned agent distillation framework that improves both the distilled data and the supervision signal. SmartAD (i) selects, for each training example, the trajectory with the minimum negative log-likelihood among multiple correct teacher samples to obtain student-friendly training data, and (ii) applies a segment-weighted loss that emphasizes action execution and final decision spans over intermediate reasoning. Experiments on multi-hop QA and math benchmarks with 1.5B and 3B models show that SmartAD consistently outperforms all baselines. Overall, our method enables small models to learn the teacher’s capabilities more easily and efficiently through trajectory selection and segment-weighted supervision, achieving capacity-aligned distillation.