Yuhui Wang


2025

pdf bib
Parrot: A Training Pipeline Enhances Both Program CoT and Natural Language CoT for Reasoning
Senjie Jin | Lu Chen | Zhiheng Xi | Yuhui Wang | Sirui Song | Yuhao Zhou | Xinbo Zhang | Peng Sun | Hong Lu | Tao Gui | Qi Zhang | Xuanjing Huang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Natural language chain-of-thought (N-CoT) and Program chain-of-thought (P-CoT) have emerged as two primary paradigms for large language models (LLMs) to solve mathematical reasoning problems. Current research typically endeavors to achieve unidirectional enhancement: P-CoT enhanced N-CoT or N-CoT enhanced P-CoT. In this paper, we seek to fully unleash the two paradigms’ strengths for mutual enhancement and ultimately achieve simultaneous improvements. We conduct a detailed analysis of the error types across two paradigms, based on which we propose Parrot, a novel training pipeline for mathematical problems: 1) Three target-designed subtasks integrate sequential P-CoT and N-CoT generation. 2) A subtask hybrid training strategy to facilitate natural language semantic transferability. 3) The converted N-CoT auxiliary reward is designed to alleviate the sparse rewards in P-CoT optimization. Extensive experiments demonstrate that Parrot significantly enhances both the performance of N-CoT and P-CoT, especially on N-CoT. Using Parrot SFT, the LLaMA2’s and CodeLLaMA’s N-CoT performance achieve gains of +21.87 and +21.48 on MathQA over the RL baseline, which is resource-intensive.

pdf bib
PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts
Ming Zhang | Yuhui Wang | Yujiong Shen | Tingyi Yang | Changhao Jiang | Yilong Wu | Shihan Dou | Qinhao Chen | Zhiheng Xi | Zhihao Zhang | Yi Dong | Zhen Wang | Zhihui Fei | Mingyang Wan | Tao Liang | Guojun Ma | Qi Zhang | Tao Gui | Xuanjing Huang
Findings of the Association for Computational Linguistics: ACL 2025

Process-driven dialogue systems, which operate under strict predefined process constraints, are essential in customer service and equipment maintenance scenarios. Although Large Language Models (LLMs) have shown remarkable progress in dialogue and reasoning, they still struggle to solve these strictly constrained dialogue tasks. To address this challenge, we construct Process Flow Dialogue (PFDial) dataset, which contains 12,705 high-quality Chinese dialogue instructions derived from 440 flowcharts containing 5,055 process nodes. Based on PlantUML specification, each UML flowchart is converted into atomic dialogue units i.e., structured five-tuples. Experimental results demonstrate that a 7B model trained with merely 800 samples, and a 0.5B model trained on total data both can surpass 90% accuracy. Additionally, the 8B model can surpass GPT-4o up to 43.88% with an average of 11.00%. We further evaluate models’ performance on challenging backward transitions in process flows and conduct an in-depth analysis of various dataset formats to reveal their impact on model performance in handling decision and sequential branches. The data is released in https://github.com/KongLongGeFDU/PFDial.