Mingyuan Fan


2026

Despite the advanced capabilities of Large Language Models (LLMs), training specialized reasoning models for the medical domain remains a significant challenge due to the scarcity of high-quality, large-scale Chain-of-Thought (CoT) data. Moreover, the intermediate reasoning steps in teacher-generated CoT data can be redundant and noisy, leading models to acquire spurious patterns and resulting in suboptimal performance. To address these issues, we propose MedCoach, a novel framework that introduces a dedicated coach role to guide the student model through question decomposition, thereby smoothing its learning curve in medical reasoning. The framework employs a curriculum-oriented warm-up on simplified sub-questions, facilitating domain adaptation before advancing to complex long-chain reasoning. To ensure the fidelity of the intermediate chain-of-thought signals, we augment this phase with medical knowledge graphs to suppress factual drift and mitigate reasoning noise at a granular level.Subsequently, we introduce a targeted factual perturbation mechanism to foster fine-grained discrimination between valid fact utilization and subtle factual misapplications. Extensive experiments across diverse benchmarks demonstrate notable improvements over existing methods, validating the effectiveness of MedCoach.