Xiaoting Wu
2026
Beyond Static Profiles: Capturing the Fluidity of User Preferences in Diverse Scenarios
Chunyang Gao | Yi Huang | Jingyu Yao | Xiaoting Wu | Junlan Feng
Findings of the Association for Computational Linguistics: ACL 2026
Chunyang Gao | Yi Huang | Jingyu Yao | Xiaoting Wu | Junlan Feng
Findings of the Association for Computational Linguistics: ACL 2026
Despite the remarkable evolution of Large Language Models (LLMs) from simple assistants to versatile agents, effective personalization remains a significant challenge. Existing approaches often treat user preferences as static or merely time-varying traits, overlooking the dynamic nature of human behavior: preferences can shift, and even conflict, depending on context. To address this limitation, we propose a fine-grained taxonomy to differentiate between stable preferences, which are context-agnostic, and situational preferences, which are context-dependent. Building on this taxonomy, we introduce S2Pref, a new dataset of 10k meticulously curated entries. Each entry is grounded in a multi-turn dialogue that implicitly manifests either a stable or a situational preference, as defined by our hierarchical taxonomy. We further design three complementary evaluation tasks to benchmark LLMs on their ability to prioritize contextual signals, proactively resolve ambiguity, and efficiently infer user preferences. Our dataset and diagnostic tasks provide a practical testbed for advancing dynamic, context-aware personalization in conversational agents.
Thinking Alignment of Scenario-Oriented User Simulation
Xiaoting Wu | Yi Huang | Chunyang Gao | Mengfei Guo | Jingyu Yao | Junlan Feng
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xiaoting Wu | Yi Huang | Chunyang Gao | Mengfei Guo | Jingyu Yao | Junlan Feng
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Existing user simulators based on prompting to role-play or SFT are generally confined to imitating users’ textual utterances, without adequately considering the multi-faceted cognitive processes that underlie human decision-making during interactions. To facilitate better alignment with real human thinking patterns, we construct the LMSYS-UserThinking dataset, in which we augment 51k human–LLM conversations by reconstructing the user’s inner reasoning both during and at the end of each dialogue. Furthermore, to enhance controllability and situational coherence, we introduce scenario settings that describe the global context and user goals throughout multi-turn conversations. Using this dataset, we train user simulators called ThinkingUS on different base models. We evaluate our approach from both offline and online user simulation perspectives, ultimately demonstrating its effectiveness.
A Learnable Skill Combination Strategy for Multi-task Learning in Natural Language Understanding
Zhe Yang | Yi Huang | Yaqin Chen | Mengfei Guo | Xiaoting Wu | Junlan Feng
Findings of the Association for Computational Linguistics: ACL 2026
Zhe Yang | Yi Huang | Yaqin Chen | Mengfei Guo | Xiaoting Wu | Junlan Feng
Findings of the Association for Computational Linguistics: ACL 2026
In the realm of domain-specific natural language understanding (NLU) tasks, acquiring high-quality labeled data is often arduous, thereby posing significant challenges for effective model training. Multi-task learning (MTL) addresses these limitations by jointly optimizing multiple tasks within a unified framework. In this paper, we introduce a novel sparse NLU multi-task learning framework that decomposes the language model into modular skill components and employs a dynamic, learnable skill-combination mechanism to adaptively handle diverse tasks. Extensive experiments on benchmark NLU datasets demonstrate that our proposed method surpasses conventional multi-task learning approaches in performance.
2025
Palette of Language Models: A Solver for Controlled Text Generation
Zhe Yang | Yi Huang | Yaqin Chen | Xiaoting Wu | Junlan Feng | Chao Deng
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Zhe Yang | Yi Huang | Yaqin Chen | Xiaoting Wu | Junlan Feng | Chao Deng
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Recent advancements in large language models have revolutionized text generation with their remarkable capabilities. These models can produce controlled texts that closely adhere to specific requirements when prompted appropriately. However, designing an optimal prompt to control multiple attributes simultaneously can be challenging. A common approach is to linearly combine single-attribute models, but this strategy often overlooks attribute overlaps and can lead to conflicts. Therefore, we propose a novel combination strategy inspired by the Law of Total Probability and Conditional Mutual Information Minimization on generative language models. This method has been adapted for single-attribute control scenario and is termed the Palette of Language Models due to its theoretical linkage between attribute strength and generation style, akin to blending colors on an artist’s palette. Moreover, positive correlation and attribute enhancement are advanced as theoretical properties to guide a rational combination strategy design. We conduct experiments on both single control and multiple control settings, and achieve surpassing results.
2024
LLM as a metric critic for low resource relation identification
Zhe Yang | Yi Huang | Yaqin Chen | Xiaoting Wu | Junlan Feng | Chao Deng
Findings of the Association for Computational Linguistics: EMNLP 2024
Zhe Yang | Yi Huang | Yaqin Chen | Xiaoting Wu | Junlan Feng | Chao Deng
Findings of the Association for Computational Linguistics: EMNLP 2024
In extremely low resource relation identification scenario, small language models (SLMs) incline to overfit, which significantly diminishes their accuracy. Recently, large language models (LLMs) are gradually applied to classification tasks with converting original objective into the generation task via in-context learning. However, abundance of the classifier categories poses challenges in selecting demonstrations. Moreover, the mapping between category labels and textual descriptions requires expensive expert knowledge, thereby constraining the efficacy of in-context learning for LLMs. We uphold that SLM is optimal for handling classification tasks, and its shortcomings in the low resource setting can be mitigated by leveraging LLM. Hence, we propose a co-evolution strategy on SLM & LLM for relation identification. Specifically, LLM provides essential background knowledge to assist training process of the SLM classifier, while evaluation metrics from the classifier, in turn, offer valuable insights to refine the generation prompts of the LLM. We conduct experiments on several datasets which demonstrates preponderance of the proposed model.
2022
CMCC: A Comprehensive and Large-Scale Human-Human Dataset for Dialogue Systems
Yi Huang | Xiaoting Wu | Si Chen | Wei Hu | Qing Zhu | Junlan Feng | Chao Deng | Zhijian Ou | Jiangjiang Zhao
Proceedings of the Towards Semi-Supervised and Reinforced Task-Oriented Dialog Systems (SereTOD)
Yi Huang | Xiaoting Wu | Si Chen | Wei Hu | Qing Zhu | Junlan Feng | Chao Deng | Zhijian Ou | Jiangjiang Zhao
Proceedings of the Towards Semi-Supervised and Reinforced Task-Oriented Dialog Systems (SereTOD)
Dialogue modeling problems severely limit the real-world deployment of neural conversational models and building a human-like dialogue agent is an extremely challenging task. Recently, data-driven models become more and more prevalent which need a huge amount of conversation data. In this paper, we release around 100,000 dialogue, which come from real-world dialogue transcripts between real users and customer-service staffs. We call this dataset as CMCC (China Mobile Customer Care) dataset, which differs from existing dialogue datasets in both size and nature significantly. The dataset reflects several characteristics of human-human conversations, e.g., task-driven, care-oriented, and long-term dependency among the context. It also covers various dialogue types including task-oriented, chitchat and conversational recommendation in real-world scenarios. To our knowledge, CMCC is the largest real human-human spoken dialogue dataset and has dozens of times the data scale of others, which shall significantly promote the training and evaluation of dialogue modeling methods. The results of extensive experiments indicate that CMCC is challenging and needs further effort. We hope that this resource will allow for more effective models across various dialogue sub-problems to be built in the future.
State-Aware Adversarial Training for Utterance-Level Dialogue Generation
Yi Huang | Xiaoting Wu | Wei Hu | Junlan Feng | Chao Deng
Proceedings of the Towards Semi-Supervised and Reinforced Task-Oriented Dialog Systems (SereTOD)
Yi Huang | Xiaoting Wu | Wei Hu | Junlan Feng | Chao Deng
Proceedings of the Towards Semi-Supervised and Reinforced Task-Oriented Dialog Systems (SereTOD)
Dialogue generation is a challenging problem because it not only requires us to model the context in a conversation but also to exploit it to generate a coherent and fluent utterance. This paper, aiming for a specific topic of this field, proposes an adversarial training based framework for utterance-level dialogue generation. Technically, we train an encoder-decoder generator simultaneously with a discriminative classifier that make the utterance approximate to the state-aware inputs. Experiments on MultiWoZ 2.0 and MultiWoZ 2.1 datasets show that our method achieves advanced improvements on both automatic and human evaluations, and on the effectiveness of our framework facing low-resource. We further explore the effect of fine-grained augmentations for downstream dialogue state tracking (DST) tasks. Experimental results demonstrate the high-quality data generated by our proposed framework improves the performance over state-of-the-art models.
2021
Counterfactual Matters: Intrinsic Probing For Dialogue State Tracking
Yi Huang | Junlan Feng | Xiaoting Wu | Xiaoyu Du
The First Workshop on Evaluations and Assessments of Neural Conversation Systems
Yi Huang | Junlan Feng | Xiaoting Wu | Xiaoyu Du
The First Workshop on Evaluations and Assessments of Neural Conversation Systems
A Dialogue State Tracker (DST) is a core component of modular task-oriented dialogue systems. Tremendous research progress has been made in past ten years to improve performance of DSTs especially on benchmark datasets. However, their generalization to novel and realistic scenarios beyond the held-out conversations is limited. In this paper, we design experimental studies to answer: 1) How does the distribution of dialogue data affect the performance of DSTs? 2) What are effective ways to probe counterfactual matter for DSTs? Our findings are: the performance variance of generative DSTs is not only due to the model structure itself, but can be attributed to the distribution of cross-domain values. Evaluating iconic generative DST models on MultiWOZ dataset with counterfactuals results in a significant performance drop of up to 34.64% (from 50.91% to 16.27%) in absolute joint goal accuracy. It is believed that our experimental results can guide the future work to better understand the intrinsic core of DST and rethink the suitable way for specific tasks given the application property.
2020
Towards Low-Resource Semi-Supervised Dialogue Generation with Meta-Learning
Yi Huang | Junlan Feng | Shuo Ma | Xiaoyu Du | Xiaoting Wu
Findings of the Association for Computational Linguistics: EMNLP 2020
Yi Huang | Junlan Feng | Shuo Ma | Xiaoyu Du | Xiaoting Wu
Findings of the Association for Computational Linguistics: EMNLP 2020
In this paper, we propose a meta-learning based semi-supervised explicit dialogue state tracker (SEDST) for neural dialogue generation, denoted as MEDST. Our main motivation is to further bridge the chasm between the need for high accuracy dialogue state tracker and the common reality that only scarce annotated data is available for most real-life dialogue tasks. Specifically, MEDST has two core steps: meta-training with adequate unlabelled data in an automatic way and meta-testing with a few annotated data by supervised learning. In particular, we enhance SEDST via entropy regularization, and investigate semi-supervised learning frameworks based on model-agnostic meta-learning (MAML) that are able to reduce the amount of required intermediate state labelling. We find that by leveraging un-annotated data in meta-way instead, the amount of dialogue state annotations can be reduced below 10% while maintaining equivalent system performance. Experimental results show MEDST outperforms SEDST substantially by 18.7% joint goal accuracy and 14.3% entity match rate on the KVRET corpus with 2% labelled data in semi-supervision.
Meta-Reinforced Multi-Domain State Generator for Dialogue Systems
Yi Huang | Junlan Feng | Min Hu | Xiaoting Wu | Xiaoyu Du | Shuo Ma
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Yi Huang | Junlan Feng | Min Hu | Xiaoting Wu | Xiaoyu Du | Shuo Ma
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
A Dialogue State Tracker (DST) is a core component of a modular task-oriented dialogue system. Tremendous progress has been made in recent years. However, the major challenges remain. The state-of-the-art accuracy for DST is below 50% for a multi-domain dialogue task. A learnable DST for any new domain requires a large amount of labeled in-domain data and training from scratch. In this paper, we propose a Meta-Reinforced Multi-Domain State Generator (MERET). Our first contribution is to improve the DST accuracy. We enhance a neural model based DST generator with a reward manager, which is built on policy gradient reinforcement learning (RL) to fine-tune the generator. With this change, we are able to improve the joint accuracy of DST from 48.79% to 50.91% on the MultiWOZ corpus. Second, we explore to train a DST meta-learning model with a few domains as source domains and a new domain as target domain. We apply the model-agnostic meta-learning algorithm (MAML) to DST and the obtained meta-learning model is used for new domain adaptation. Our experimental results show this solution is able to outperform the traditional training approach with extremely less training data in target domain.