Xiangyu Wen
2025
Dyve: Thinking Fast and Slow for Dynamic Process Verification
Jianyuan Zhong
|
Zeju Li
|
Zhijian Xu
|
Xiangyu Wen
|
Qiang Xu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large Language Models have advanced significantly in complex reasoning, often leveraging external reward model to improve the reliability of their multi-step processes. However, existing process verification methods struggle with reliably assessing incomplete reasoning traces and are limited by the cost of high-quality human annotations or the inherent noise in automatically generated labels. Therefore, we present Dyve, a dynamic process verifier that enhances reasoning error detection in large language models by integrating fast and slow thinking, inspired by Kahneman’s Systems Theory. Dyve adaptively applies immediate token-level confirmation (System 1) for straightforward steps and comprehensive analysis (System 2) for complex ones. Unlike traditional verifiers that only evaluate final outputs, Dyve employs a step-wise consensus-filtered supervision strategy, leveraging Monte Carlo estimation, LLM-as-a-Judge, and specialized reasoning models to extract high-quality training signals from noisy rollouts. Experimental results on ProcessBench and the MATH dataset confirm that Dyve significantly outperforms existing process-based verifiers and boosts performance in Best-of-N settings while maintaining computational efficiency by strategically allocating verification resources.
Guideline Compliance in Task-Oriented Dialogue: The Chained Prior Approach
Xiangyu Wen
|
Jianyuan Zhong
|
Zhijian Xu
|
Qiang Xu
Findings of the Association for Computational Linguistics: NAACL 2025
Task-oriented dialogue (TOD) systems are widely used across various domains, including customer service, appointment scheduling, and technical support. In real-world scenarios, such systems must adhere to given operational guidelines. However, existing solutions based on large language models often cannot achieve strict guideline compliance, even when fine-tuned with domain knowledge. To address this issue, we introduce a novel TOD system named GuidedTOD, which explicitly considers domain-specific guidelines by integrating a policy module. This module employs a Markov Chain, termed Chained Prior, to efficiently encode and dynamically update guideline knowledge. During inference, the Chained Prior re-ranks outputs from the domain-expert language model using beam search, ensuring guideline adherence. Experimental results show that GuidedTOD significantly improves guideline compliance, achieving approximately 20% better action prediction accuracy than state-of-the-art solutions. Code is available here: https://github.com/cure-lab/GuidedTOD.