Zixiong Yu
2026
Dynamic Sampling that Adapts: Self-Aware Iterative Data Persistent Optimization for Mathematical Reasoning
Jun Rao | Xuebo Liu | Hexuan Deng | Zepeng Lin | Zixiong Yu | Jiansheng Wei | Xiaojun Meng | Min Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Jun Rao | Xuebo Liu | Hexuan Deng | Zepeng Lin | Zixiong Yu | Jiansheng Wei | Xiaojun Meng | Min Zhang
Findings of the Association for Computational Linguistics: ACL 2026
In mathematical reasoning, data selection strategies predominantly rely on static, externally defined metrics, which fail to adapt to the evolving capabilities of models during training. This misalignment limits the efficiency of Supervised Fine-Tuning and Reinforcement Learning. To bridge this gap, we introduce SAI-DPO (Self-Aware Iterative Data Persistent Optimization), a dynamic sampling framework that aligns training data with the model’s intrinsic competence. SAI-DPO operationalizes two novel metrics: Knowledge Semantic Alignment for targeting domain weaknesses, and Self-Aware Difficulty, derived from pass rates and reasoning path characteristics, to gauge instance complexity relative to the model’s current state. By iteratively recalibrating the data distribution based on real-time feedback, SAI-DPO dynamically aligns training samples with the model’s evolving competence, ensuring the data remains strictly relevant to the model’s current capability level. Extensive experiments on eight benchmarks (including AIME24 and AMC23) demonstrate that SAI-DPO outperforms static baselines at most nearly 6 points, achieving state-of-the-art efficiency with significantly less data.
MathAgent: Adversarial Evolution of Constraint Graphs for Mathematical Reasoning Data Synthesis
Zixiong Yu | Jun Rao | Guhan Chen | Songtao Tian | Bohan Li | Jiansheng Wei | Min Zhang | Xiaojun Meng
Findings of the Association for Computational Linguistics: ACL 2026
Zixiong Yu | Jun Rao | Guhan Chen | Songtao Tian | Bohan Li | Jiansheng Wei | Min Zhang | Xiaojun Meng
Findings of the Association for Computational Linguistics: ACL 2026
Synthesizing high-quality mathematical reasoning data without human priors remains a significant challenge. Current approaches typically rely on seed data mutation or simple prompt engineering, often suffering from mode collapse and limited logical complexity. This paper proposes a hierarchical synthesis framework that formulates data synthesis as an unsupervised optimization problem over a constraint graph followed by semantic instantiation, rather than treating it as a direct text generation task. We introduce a Legislator-Executor paradigm: The Legislator adversarially evolves structured generation blueprints encoding the constraints of the problem, while the Executor instantiates these specifications into diverse natural language scenarios. This decoupling of skeleton design from linguistic realization enables a prioritized focus on constructing complex and diverse logical structures, thereby guiding high-quality data synthesis. Experiments conducted on a total of 10 models across the Qwen, Llama, Mistral, and Gemma series demonstrate that our method achieves notable results: models fine-tuned on 1K synthesized samples outperform widely-used datasets of comparable scale (LIMO, s1K) across eight mathematical benchmarks, exhibiting superior out-of-distribution generalization.