Junfeng Wang
2026
Dr. Assistant: Enhancing Clinical Diagnostic Inquiry via Structured Diagnostic Reasoning Data and Reinforcement Learning
Yue Guo | Fanfu Wang | Jianwei Lv | Xincheng Shi | Yuchen Li | Youya Wang | Yunsheng Zeng | Yujing Liu | Yunhao Qiao | Gen Li | Junfeng Wang | Bo Yuan
Findings of the Association for Computational Linguistics: ACL 2026
Yue Guo | Fanfu Wang | Jianwei Lv | Xincheng Shi | Yuchen Li | Youya Wang | Yunsheng Zeng | Yujing Liu | Yunhao Qiao | Gen Li | Junfeng Wang | Bo Yuan
Findings of the Association for Computational Linguistics: ACL 2026
Clinical Decision Support Systems (CDSSs) provide reasoning and inquiry guidance for physicians, yet they face notable challenges, including high maintenance costs and low generalization capability.Recently, Large Language Models (LLMs) have been widely adopted in healthcare due to their extensive knowledge reserves, retrieval, and communication capabilities. While LLMs show promise and excel at medical benchmarks, their diagnostic reasoning and inquiry skills are constrained.To mitigate this issue, we propose (1) Clinical Diagnostic Reasoning Data (CDRD) structure to capture abstract clinical reasoning logic, and a pipeline for its construction, and (2) the Dr. Assistant, a clinical diagnostic model equipped with clinical reasoning and inquiry skills. Its training involves a two-stage process: SFT, followed by RL with a tailored reward function.We also introduce a benchmark to evaluate both diagnostic reasoning and inquiry.Our experiments demonstrate that the Dr. Assistant outperforms open-source models and achieves competitive performance to closed-source models, providing an effective solution for clinical diagnostic inquiry guidance. Project information can be found at: https://github.com/YGswu/Dr.-Assistant.
DORA: A Dual-Objective Reinforcement Learning Framework for Effective and Efficient Multimodal Agentic Search
Guangming Qin | Yuhao Deng | Yukun Zhao | Zhenyang Li | Junfeng Wang | Dawei Yin | Ye Yuan | Guoren Wang | Yizhou Yan | Chengliang Chai | Lei Cao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Guangming Qin | Yuhao Deng | Yukun Zhao | Zhenyang Li | Junfeng Wang | Dawei Yin | Ye Yuan | Guoren Wang | Yizhou Yan | Chengliang Chai | Lei Cao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The most recent research uses reinforcement learning (RL) to post-train Multi-modal Large Language Models (MLLMs) such that these models are able to iteratively call search engines to dynamically access external knowledge when handling complex Visual Question Answering (VQA) tasks. However, existing methods face two major limitations in effectiveness and efficiency: i) For effectiveness, the objective of these methods, which only considers the correctness of the generated final response, overlooks the quality of intermediate search results, thus leading to suboptimal search strategies. ii) For efficiency, existing methods often unnecessarily invoke search calls during reasoning, making the inference inefficient. To address these issues, we propose , a customized dual-objective reinforcement learning framework to improve the search strategies of MLLMs, enhancing their search quality yet minimizing search frequency. The key ideas include (1) a reward function that promotes correct reasoning trajectories with fewer search calls; and (2) dual optimization objectives that jointly optimize search quality and answer correctness. Extensive experiments on 3 real-world datasets demonstrate that DORA outperforms state-of-the-art methods, achieving up to 8.4% higher accuracy while reducing the number of search calls by 9.7%.
2025
CTR-Guided Generative Query Suggestion in Conversational Search
Erxue Min | Hsiu-Yuan Huang | Xihong Yang | Min Yang | Xin Jia | Yunfang Wu | Hengyi Cai | Junfeng Wang | Shuaiqiang Wang | Dawei Yin
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Erxue Min | Hsiu-Yuan Huang | Xihong Yang | Min Yang | Xin Jia | Yunfang Wu | Hengyi Cai | Junfeng Wang | Shuaiqiang Wang | Dawei Yin
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Generating effective query suggestions in conversational search requires aligning model outputs with user click preferences. However, directly optimizing for these preferences is difficult because click signals are sparse and inherently noisy. To address this, we propose Generative Query Suggestion (GQS), a generative framework that leverages click modeling to denoise implicit feedback and enables reliable preference optimization for improving real-world user engagement.GQS consists of three key components: (1) a Multi-Source CTR Modeling module that captures diverse contextual signals to estimate fine-grained click-through rates, thereby constructing more reliable user click-preference pairs; (2) a Diversity-Aware Preference Alignment strategy using CTR-weighted Direct Preference Optimization (DPO), which balances relevance and semantic diversity; and (3) a CTR-Calibrated Iterative Optimization process that jointly refines both the CTR model and the query suggestion model across training rounds, enabling effective data reuse.Experiments on two real-world tasks demonstrate that GQS outperforms strong baselines in CTR, relevance, and diversity.
Proactive Guidance of Multi-Turn Conversation in Industrial Search
Xiaoyu Li | Xiao Li | Li Gao | Yiding Liu | Xiaoyang Wang | Shuaiqiang Wang | Junfeng Wang | Dawei Yin
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
Xiaoyu Li | Xiao Li | Li Gao | Yiding Liu | Xiaoyang Wang | Shuaiqiang Wang | Junfeng Wang | Dawei Yin
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
The evolution of Large Language Models (LLMs) has significantly advanced multi-turn conversation systems, emphasizing the need for proactive guidance to enhance users’ interactions. However, these systems face challenges in dynamically adapting to shifts in users’ goals and maintaining low latency for real-time interactions. In the Baidu Search AI assistant, an industrial-scale multi-turn search system, we propose a novel two-phase framework to provide proactive guidance. The first phase, Goal-adaptive Supervised Fine-Tuning (G-SFT), employs a goal adaptation agent that dynamically adapts to user goal shifts and provides goal-relevant contextual information. G-SFT also incorporates scalable knowledge transfer to distill insights from LLMs into a lightweight model for real-time interaction. The second phase, Click-oriented Reinforcement Learning (C-RL), adopts a generate-rank paradigm, systematically constructs preference pairs from user click signals, and proactively improves click-through rates through more engaging guidance. This dual-phase architecture achieves complementary objectives: G-SFT ensures accurate goal tracking, while C-RL optimizes interaction quality through click signal-driven reinforcement learning. Extensive experiments demonstrate that our framework achieves 86.10% accuracy in offline evaluation (+23.95% over baseline) and 25.28% CTR in online deployment (149.06% relative improvement), while reducing inference latency by 69.55% through scalable knowledge distillation.
2024
VisLingInstruct: Elevating Zero-Shot Learning in Multi-Modal Language Models with Autonomous Instruction Optimization
Dongsheng Zhu | Xunzhu Tang | Weidong Han | Jinghui Lu | Yukun Zhao | Guoliang Xing | Junfeng Wang | Dawei Yin
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Dongsheng Zhu | Xunzhu Tang | Weidong Han | Jinghui Lu | Yukun Zhao | Guoliang Xing | Junfeng Wang | Dawei Yin
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
This paper presents VisLingInstruct, a novel approach to advancing Multi-Modal Language Models (MMLMs) in zero-shot learning. Current MMLMs show impressive zero-shot abilities in multi-modal tasks, but their performance depends heavily on the quality of instructions. VisLingInstruct tackles this by autonomously evaluating and optimizing instructional texts through In-Context Learning, improving the synergy between visual perception and linguistic expression in MMLMs. Alongside this instructional advancement, we have also optimized the visual feature extraction modules in MMLMs, further augmenting their responsiveness to textual content. Our comprehensive experiments on MMLMs, based on FlanT5 and Vicuna, show that VisLingInstruct significantly improves zero-shot performance in visual multi-modal tasks. Notably, it achieves a 13.1% and 9% increase in accuracy over the prior state-of-the-art on the TextVQA and HatefulMemes datasets. Our main code is available at https://github.com/Zhudongsheng75/VisLingInstruct
GOVERN: Gradient Orientation Vote Ensemble for Multi-Teacher Reinforced Distillation
Wenjie Zhou | Zhenxin Ding | Xiaodong Zhang | Haibo Shi | Junfeng Wang | Dawei Yin
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Wenjie Zhou | Zhenxin Ding | Xiaodong Zhang | Haibo Shi | Junfeng Wang | Dawei Yin
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Pre-trained language models have become an integral component of question-answering systems, achieving remarkable performance. However, for practical deployment, it is crucial to perform knowledge distillation to maintain high performance while operating under computational constraints. In this paper, we address a key question: given the importance of unsupervised distillation for student model performance, how can knowledge from multiple teacher models be effectively ensemble during this stage without the guidance of labels? We propose a novel algorithm, GOVERN, to tackle this issue. GOVERN has demonstrated significant improvements in both offline and online experiments, enabling the student model to achieve results comparable to that of teacher ensembles. Our experiments show that GOVERN remarkably requires a mere 1% of the ensemble method’s inference budget to achieve 99.5% of performance. The proposed algorithm has been successfully deployed in a real-world commercial question-answering system, demonstrating its real-world applicability.
Search
Fix author
Co-authors
- Dawei Yin 5
- Shuaiqiang Wang 2
- Yukun Zhao 2
- Hengyi Cai 1
- Lei Cao 1
- Chengliang Chai 1
- Yuhao Deng 1
- Zhenxin Ding 1
- Li Gao 1
- Yue Guo 1
- Weidong Han 1
- Hsiu-Yuan Huang 1
- Xin Jia 1
- Yuchen Li 1
- Gen Li 1
- Zhenyang Li 1
- Xiaoyu Li 1
- Xiao Li 1
- Yujing Liu 1
- Yiding Liu 1
- Jinghui Lu 1
- Jianwei Lv 1
- Erxue Min 1
- Yunhao Qiao 1
- Guangming Qin 1
- Haibo Shi 1
- Xincheng Shi 1
- Xunzhu Tang 1
- Fanfu Wang 1
- Youya Wang 1
- Guoren Wang 1
- Xiaoyang Wang 1
- Yunfang Wu 1
- Guoliang Xing 1
- Yizhou Yan 1
- Xihong Yang 1
- Min Yang 1
- Bo Yuan 1
- Ye Yuan 1
- Yunsheng Zeng 1
- Xiaodong Zhang 1
- Wenjie Zhou 1
- Dongsheng Zhu 1