Jianyuan Zhong

2026

Solve-Detect-Verify: Inference-Time Scaling with Flexible Generative Verifier
Jianyuan Zhong | Zeju Li | Zhijian Xu | Xiangyu Wen | Kezhi Li | Qiang Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Complex reasoning with Large Language Models (LLMs) demands a careful balance between accuracy and computational cost. Verification is crucial for reliability but faces trade-off: robust process-based verifiers are computationally prohibitive, while fast verifiers lack precision. We introduce flexive, a unified generative verifier designed to navigate this trade-off by dynamically allocating compute between rapid fast thinking and deliberative slow thinking. A key innovation is our training strategy: we use Group Relative Policy Optimization (GRPO) to specifically enhance the reliability of the fast mode. This targeted training generalizes effectively, elevating the slow mode to state-of-the-art open-source performance. To deploy flexive, we propose the solve-detect-verify (SDV) pipeline. Moving beyond static Best-of-N ranking, SDV employs an iterative refinement process that utilizes likelihood-based probing to detect solution completion, curtailing overthinking, and leverages flexive’s feedback for targeted correction. Solve-detect-verify establishes a new open-source state-of-the-art on ProcessBench, outperforming GenPRM-32B while requiring ~2.3x fewer TFLOPS and 15x less training data. On AIME 2024, the full SDV pipeline achieves 83.3% accuracy, surpassing strong baselines while using significantly fewer tokens.

2025

pdf bib abs

Guideline Compliance in Task-Oriented Dialogue: The Chained Prior Approach
Xiangyu Wen | Jianyuan Zhong | Zhijian Xu | Qiang Xu
Findings of the Association for Computational Linguistics: NAACL 2025

Task-oriented dialogue (TOD) systems are widely used across various domains, including customer service, appointment scheduling, and technical support. In real-world scenarios, such systems must adhere to given operational guidelines. However, existing solutions based on large language models often cannot achieve strict guideline compliance, even when fine-tuned with domain knowledge. To address this issue, we introduce a novel TOD system named GuidedTOD, which explicitly considers domain-specific guidelines by integrating a policy module. This module employs a Markov Chain, termed Chained Prior, to efficiently encode and dynamically update guideline knowledge. During inference, the Chained Prior re-ranks outputs from the domain-expert language model using beam search, ensuring guideline adherence. Experimental results show that GuidedTOD significantly improves guideline compliance, achieving approximately 20% better action prediction accuracy than state-of-the-art solutions. Code is available here: https://github.com/cure-lab/GuidedTOD.

pdf bib abs

Dyve: Thinking Fast and Slow for Dynamic Process Verification
Jianyuan Zhong | Zeju Li | Zhijian Xu | Xiangyu Wen | Qiang Xu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Large Language Models have advanced significantly in complex reasoning, often leveraging external reward model to improve the reliability of their multi-step processes. However, existing process verification methods struggle with reliably assessing incomplete reasoning traces and are limited by the cost of high-quality human annotations or the inherent noise in automatically generated labels. Therefore, we present Dyve, a dynamic process verifier that enhances reasoning error detection in large language models by integrating fast and slow thinking, inspired by Kahneman’s Systems Theory. Dyve adaptively applies immediate token-level confirmation (System 1) for straightforward steps and comprehensive analysis (System 2) for complex ones. Unlike traditional verifiers that only evaluate final outputs, Dyve employs a step-wise consensus-filtered supervision strategy, leveraging Monte Carlo estimation, LLM-as-a-Judge, and specialized reasoning models to extract high-quality training signals from noisy rollouts. Experimental results on ProcessBench and the MATH dataset confirm that Dyve significantly outperforms existing process-based verifiers and boosts performance in Best-of-N settings while maintaining computational efficiency by strategically allocating verification resources.

2019

pdf bib abs

UR-FUNNY: A Multimodal Language Dataset for Understanding Humor
Md Kamrul Hasan | Wasifur Rahman | AmirAli Bagher Zadeh | Jianyuan Zhong | Md Iftekhar Tanveer | Louis-Philippe Morency | Mohammed (Ehsan) Hoque
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Humor is a unique and creative communicative behavior often displayed during social interactions. It is produced in a multimodal manner, through the usage of words (text), gestures (visual) and prosodic cues (acoustic). Understanding humor from these three modalities falls within boundaries of multimodal language; a recent research trend in natural language processing that models natural language as it happens in face-to-face communication. Although humor detection is an established research area in NLP, in a multimodal context it has been understudied. This paper presents a diverse multimodal dataset, called UR-FUNNY, to open the door to understanding multimodal language used in expressing humor. The dataset and accompanying studies, present a framework in multimodal humor detection for the natural language processing community. UR-FUNNY is publicly available for research.

Co-authors

Md Kamrul Hasan 1

Mohammed (Ehsan) Hoque 1

Kezhi Li 1

Louis-Philippe Morency 1

Wasifur Rahman 1

Md Iftekhar Tanveer 1

Venues

Fix author