Yifan Wei
2026
DyBBT: Dynamic Balance via Bandit-inspired Targeting for Dialog Policy with Cognitive Dual Systems
Shuyu Zhang | Yifan Wei | Jialuo Yuan | Xinru Wang | Yanmin Zhu | Yujie Liu | Bin Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Shuyu Zhang | Yifan Wei | Jialuo Yuan | Xinru Wang | Yanmin Zhu | Yujie Liu | Bin Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Task oriented dialog systems often rely on static exploration strategies that do not adapt to dynamic dialog contexts, leading to inefficient exploration and suboptimal performance. We propose DyBBT, a novel dialog policy learning framework that formalizes the exploration challenge through a structured cognitive state space 𝒞 that captures dialog progression, user uncertainty, and slot dependency. DyBBT proposes a bandit-inspired meta-controller that dynamically switches between a fast intuitive inference (System 1) and a slow deliberative reasoner (System 2) based on real-time cognitive states and visitation counts. Extensive experiments on single- and multi-domain benchmarks show that DyBBT achieves SOTA performance in success rate, efficiency, and generalization, with human evaluations confirming that its decisions are well-aligned with expert judgment.
HiCoLoRA: Addressing Context-Prompt Misalignment via Hierarchical Collaborative LoRA for Zero-Shot DST
Shuyu Zhang | Yifan Wei | Xinru Wang | Yanmin Zhu | Yangfan He | Yixuan Weng | Yujie Liu | Bin Li
Findings of the Association for Computational Linguistics: ACL 2026
Shuyu Zhang | Yifan Wei | Xinru Wang | Yanmin Zhu | Yangfan He | Yixuan Weng | Yujie Liu | Bin Li
Findings of the Association for Computational Linguistics: ACL 2026
Zero-shot Dialog State Tracking (zs-DST) is essential for enabling Task-Oriented Dialog Systems (TODs) to generalize to new domains without costly data annotation. A central challenge lies in the semantic misalignment between dynamic dialog contexts and static prompts, leading to inflexible cross-layer coordination, domain interference, and catastrophic forgetting. To tackle this, we propose Hierarchical Collaborative Low-Rank Adaptation (HiCoLoRA), a framework that enhances zero-shot slot inference through robust prompt alignment. It features a hierarchical LoRA architecture for dynamic layer-specific processing (combining lower-layer heuristic grouping and higher-layer full interaction), integrates Spectral Joint Domain-Slot Clustering to identify transferable associations (feeding an Adaptive Linear Fusion Mechanism), and employs Semantic-Enhanced SVD Initialization (SemSVD-Init) to preserve pre-trained knowledge. Experiments on multi-domain datasets MultiWOZ and SGD show that HiCoLoRA outperforms baselines, achieving SOTA in zs-DST.
2024
Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing Agent
Xiaoyan Yu | Tongxu Luo | Yifan Wei | Fangyu Lei | Yiming Huang | Hao Peng | Liehuang Zhu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Xiaoyan Yu | Tongxu Luo | Yifan Wei | Fangyu Lei | Yiming Huang | Hao Peng | Liehuang Zhu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Large Language Models (LLMs) have revolutionized open-domain dialogue agents but encounter challenges in multi-character role-playing (MCRP) scenarios. To address the issue, we present Neeko, an innovative framework designed for efficient multiple characters imitation. Neeko employs a dynamic low-rank adapter (LoRA) strategy, enabling it to adapt seamlessly to diverse characters. Our framework breaks down the role-playing process into agent pre-training, multiple characters playing, and character incremental learning, effectively handling both seen and unseen roles. This dynamic approach, coupled with distinct LoRA blocks for each character, enhances Neeko’s adaptability to unique attributes, personalities, and speaking patterns. As a result, Neeko demonstrates superior performance in MCRP over most existing methods, offering more engaging and versatile user interaction experiences.
EX-FEVER: A Dataset for Multi-hop Explainable Fact Verification
Huanhuan Ma | Weizhi Xu | Yifan Wei | Liuji Chen | Liang Wang | Qiang Liu | Shu Wu | Liang Wang
Findings of the Association for Computational Linguistics: ACL 2024
Huanhuan Ma | Weizhi Xu | Yifan Wei | Liuji Chen | Liang Wang | Qiang Liu | Shu Wu | Liang Wang
Findings of the Association for Computational Linguistics: ACL 2024
Fact verification aims to automatically probe the veracity of a claim based on several pieces of evidence. Existing works are always engaging in accuracy improvement, let alone explainability, a critical capability of fact verification systems.Constructing an explainable fact verification system in a complex multi-hop scenario is consistently impeded by the absence of a relevant, high-quality dataset. Previous datasets either suffer from excessive simplification or fail to incorporate essential considerations for explainability. To address this, we present EX-FEVER, a pioneering dataset for multi-hop explainable fact verification. With over 60,000 claims involving 2-hop and 3-hop reasoning, each is created by summarizing and modifying information from hyperlinked Wikipedia documents. Each instance is accompanied by a veracity label and an explanation that outlines the reasoning path supporting the veracity classification. Additionally, we demonstrate a novel baseline system on our EX-FEVER dataset, showcasing document retrieval, explanation generation, and claim verification, and validate the significance of our dataset. Furthermore, we highlight the potential of utilizing Large Language Models in the fact verification task. We hope our dataset could make a significant contribution by providing ample opportunities to explore the integration of natural language explanations in the domain of fact verification.
DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models
Yiming Huang | Jianwen Luo | Yan Yu | Yitong Zhang | Fangyu Lei | Yifan Wei | Shizhu He | Lifu Huang | Xiao Liu | Jun Zhao | Kang Liu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Yiming Huang | Jianwen Luo | Yan Yu | Yitong Zhang | Fangyu Lei | Yifan Wei | Shizhu He | Lifu Huang | Xiao Liu | Jun Zhao | Kang Liu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
We introduce DA-Code, a code generation benchmark specifically designed to assess LLMs on agent-based data science tasks. This benchmark features three core elements: First, the tasks within DA-Code are inherently challenging, setting them apart from traditional code generation tasks and demanding advanced coding skills in grounding and planning. Second, examples in DA-Code are all based on real and diverse data, covering a wide range of complex data wrangling and analytics tasks. Third, to solve the tasks, the models must utilize complex data science programming languages, including Python and SQL, to perform intricate data processing and derive the answers. We set up the benchmark in a controllable and executable environment that aligns with real-world data analysis scenarios and is scalable. The annotators meticulously designed the evaluation suite to ensure the accuracy and robustness of the evaluation. We developed the DA-Agent baseline. Experiments show that although the baseline performs better than other existing frameworks, using the current best LLMs achieves only 30.5% accuracy, leaving ample room for improvement. We release our benchmark at [link](https://github.com/yiyihum/dabench)
2023
S3HQA: A Three-Stage Approach for Multi-hop Text-Table Hybrid Question Answering
Fangyu Lei | Xiang Li | Yifan Wei | Shizhu He | Yiming Huang | Jun Zhao | Kang Liu
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Fangyu Lei | Xiang Li | Yifan Wei | Shizhu He | Yiming Huang | Jun Zhao | Kang Liu
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Answering multi-hop questions over hybrid factual knowledge from the given text and table (TextTableQA) is a challenging task. Existing models mainly adopt a retriever-reader framework, which have several deficiencies, such as noisy labeling in training retriever, insufficient utilization of heterogeneous information over text and table, and deficient ability for different reasoning operations. In this paper, we propose a three-stage TextTableQA framework S3HQA, which comprises of retriever, selector, and reasoner. We use a retriever with refinement training to solve the noisy labeling problem. Then, a hybrid selector considers the linked relationships between heterogeneous data to select the most relevant factual knowledge. For the final stage, instead of adapting a reading comprehension module like in previous methods, we employ a generation-based reasoner to obtain answers. This includes two approaches: a row-wise generator and an LLM prompting generator (first time used in this task). The experimental results demonstrate that our method achieves competitive results in the few-shot setting. When trained on the full dataset, our approach outperforms all baseline methods, ranking first on the HybridQA leaderboard.
MenatQA: A New Dataset for Testing the Temporal Comprehension and Reasoning Abilities of Large Language Models
Yifan Wei | Yisong Su | Huanhuan Ma | Xiaoyan Yu | Fangyu Lei | Yuanzhe Zhang | Jun Zhao | Kang Liu
Findings of the Association for Computational Linguistics: EMNLP 2023
Yifan Wei | Yisong Su | Huanhuan Ma | Xiaoyan Yu | Fangyu Lei | Yuanzhe Zhang | Jun Zhao | Kang Liu
Findings of the Association for Computational Linguistics: EMNLP 2023
Large language models (LLMs) have shown nearly saturated performance on many natural language processing (NLP) tasks. As a result, it is natural for people to believe that LLMs have also mastered abilities such as time understanding and reasoning. However, research on the temporal sensitivity of LLMs has been insufficiently emphasized. To fill this gap, this paper constructs Multiple Sensitive Factors Time QA (MenatQA), which encompasses three temporal factors (scope factor, order factor, counterfactual factor) with total 2,853 samples for evaluating the time comprehension and reasoning abilities of LLMs. This paper tests current mainstream LLMs with different parameter sizes, ranging from billions to hundreds of billions. The results show most LLMs fall behind smaller temporal reasoning models with different degree on these factors. In specific, LLMs show a significant vulnerability to temporal biases and depend heavily on the temporal information provided in questions. Furthermore, this paper undertakes a preliminary investigation into potential improvement strategies by devising specific prompts and leveraging external tools. These approaches serve as valuable baselines or references for future research endeavors.
Search
Fix author
Co-authors
- Fangyu Lei 4
- Yiming Huang 3
- Kang Liu 3
- Jun Zhao 3
- Shizhu He (何世柱) 2
- Bin Li 2
- Yujie Liu 2
- Huanhuan Ma 2
- Xinru Wang 2
- Xiaoyan Yu 2
- Shuyu Zhang 2
- Yanmin Zhu 2
- Liuji Chen 1
- Yangfan He 1
- Lifu Huang 1
- Xiang Li 1
- Qiang Liu 1
- Xiao Liu 1
- Tongxu Luo 1
- Jianwen Luo 1
- Hao Peng 1
- Yisong Su 1
- Liang Wang 1
- Liang Wang 1
- Yixuan Weng 1
- Shu Wu 1
- Weizhi Xu 1
- Yan Yu 1
- Jialuo Yuan 1
- Yuanzhe Zhang 1
- Yitong Zhang 1
- Liehuang Zhu 1