Yanmeng Wang


2025

pdf bib
ChatSOP: An SOP-Guided MCTS Planning Framework for Controllable LLM Dialogue Agents
Zhigen Li | Jianxiang Peng | Yanmeng Wang | Yong Cao | Tianhao Shen | Minghui Zhang | Linxi Su | Shang Wu | Yihang Wu | YuQian Wang | Ye Wang | Wei Hu | Jianfeng Li | Shaojun Wang | Jing Xiao | Deyi Xiong
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Dialogue agents powered by Large Language Models (LLMs) show superior performance in various tasks. Despite the better user understanding and human-like responses, their **lack of controllability** remains a key challenge, often leading to unfocused conversations or task failure. To address this, we introduce Standard Operating Procedure (SOP) to regulate dialogue flow. Specifically, we propose **ChatSOP**, a novel SOP-guided Monte Carlo Tree Search (MCTS) planning framework designed to enhance the controllability of LLM-driven dialogue agents. To enable this, we curate a dataset comprising SOP-annotated multi-scenario dialogues, generated using a semi-automated role-playing system with GPT-4o and validated through strict manual quality control. Additionally, we propose a novel method that integrates Chain of Thought reasoning with supervised fine-tuning for SOP prediction and utilizes SOP-guided Monte Carlo Tree Search for optimal action planning during dialogues. Experimental results demonstrate the effectiveness of our method, such as achieving a 27.95% improvement in action accuracy compared to baseline models based on GPT-3.5 and also showing notable gains for open-source models. Dataset and codes are publicly available.

pdf bib
Disentangling Preference Representation and Text Generation for Efficient Individual Preference Alignment
Jianfei Zhang | Jun Bai | Bei Li | Yanmeng Wang | Rumei Li | Chenghua Lin | Wenge Rong
Proceedings of the 31st International Conference on Computational Linguistics

Aligning Large Language Models (LLMs) with general human preferences has been proved crucial in improving the interaction quality between LLMs and human. However, human values are inherently diverse among different individuals, making it insufficient to align LLMs solely with general preferences. To address this, personalizing LLMs according to individual feedback emerges as a promising solution. Nonetheless, this approach presents challenges in terms of the efficiency of alignment algorithms. In this work, we introduce a flexible paradigm for individual preference alignment. Our method fundamentally improves efficiency by disentangling preference representation from text generation in LLMs. We validate our approach across multiple text generation tasks and demonstrate that it can produce aligned quality as well as or better than PEFT-based methods, while reducing additional training time for each new individual preference by 80% to 90% in comparison with them.

pdf bib
Selecting Demonstrations for Many-Shot In-Context Learning via Gradient Matching
Jianfei Zhang | Bei Li | Jun Bai | Rumei Li | Yanmeng Wang | Chenghua Lin | Wenge Rong
Findings of the Association for Computational Linguistics: ACL 2025

In-Context Learning (ICL) empowers Large Language Models (LLMs) for rapid task adaptation without Fine-Tuning (FT), but its reliance on demonstration selection remains a critical challenge. While many-shot ICL shows promising performance through scaled demonstrations, the selection method for many-shot demonstrations remains limited to random selection in existing work. Since the conventional instance-level retrieval is not suitable for many-shot scenarios, we hypothesize that the data requirements for in-context learning and fine-tuning are analogous. To this end, we introduce a novel gradient matching approach that selects demonstrations by aligning fine-tuning gradients between the entire training set of the target task and the selected examples, so as to approach the learning effect on the entire training set within the selected examples. Through gradient matching on relatively small models, e.g., Qwen2.5-3B or Llama3-8B, our method consistently outperforms random selection on larger LLMs from 4-shot to 128-shot scenarios across 9 diverse datasets. For instance, it surpasses random selection by 4% on Qwen2.5-72B and Llama3-70B, and by around 2% on 5 closed-source LLMs. This work unlocks more reliable and effective many-shot ICL, paving the way for its broader application.

2022

pdf bib
Learning to Adapt to Low-Resource Paraphrase Generation
Zhigen Li | Yanmeng Wang | Rizhao Fan | Ye Wang | Jianfeng Li | Shaojun Wang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Paraphrase generation is a longstanding NLP task and achieves great success with the aid of large corpora. However, transferring a paraphrasing model to another domain encounters the problem of domain shifting especially when the data is sparse. At the same time, widely using large pre-trained language models (PLMs) faces the overfitting problem when training on scarce labeled data. To mitigate these two issues, we propose, LAPA, an effective adapter for PLMs optimized by meta-learning. LAPA has three-stage training on three types of related resources to solve this problem: 1. pre-training PLMs on unsupervised corpora, 2. inserting an adapter layer and meta-training on source domain labeled data, and 3. fine-tuning adapters on a small amount of target domain labeled data. This method enables paraphrase generation models to learn basic language knowledge first, then learn the paraphrasing task itself later, and finally adapt to the target task. Our experimental results demonstrate that LAPA achieves state-of-the-art in supervised, unsupervised, and low-resource settings on three benchmark datasets. With only 2% of trainable parameters and 1% labeled data of the target task, our approach can achieve a competitive performance with previous work.

pdf bib
PINGAN Omini-Sinitic at SemEval-2022 Task 4: Multi-prompt Training for Patronizing and Condescending Language Detection
Ye Wang | Yanmeng Wang | Baishun Ling | Zexiang Liao | Shaojun Wang | Jing Xiao
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper describes the second-placed system for subtask 2 and the ninth-placed system for subtask 1 in SemEval 2022 Task 4: Patronizing and Condescending Language Detection. We propose an ensemble of prompt training and label attention mechanism for multi-label classification tasks. Transfer learning is introduced to transfer the knowledge from binary classification to multi-label classification. The experimental results proved the effectiveness of our proposed method. The ablation study is also conducted to show the validity of each technique.

2021

pdf bib
Enhancing Dual-Encoders with Question and Answer Cross-Embeddings for Answer Retrieval
Yanmeng Wang | Jun Bai | Ye Wang | Jianfei Zhang | Wenge Rong | Zongcheng Ji | Shaojun Wang | Jing Xiao
Findings of the Association for Computational Linguistics: EMNLP 2021

Dual-Encoders is a promising mechanism for answer retrieval in question answering (QA) systems. Currently most conventional Dual-Encoders learn the semantic representations of questions and answers merely through matching score. Researchers proposed to introduce the QA interaction features in scoring function but at the cost of low efficiency in inference stage. To keep independent encoding of questions and answers during inference stage, variational auto-encoder is further introduced to reconstruct answers (questions) from question (answer) embeddings as an auxiliary task to enhance QA interaction in representation learning in training stage. However, the needs of text generation and answer retrieval are different, which leads to hardness in training. In this work, we propose a framework to enhance the Dual-Encoders model with question answer cross-embeddings and a novel Geometry Alignment Mechanism (GAM) to align the geometry of embeddings from Dual-Encoders with that from Cross-Encoders. Extensive experimental results show that our framework significantly improves Dual-Encoders model and outperforms the state-of-the-art method on multiple answer retrieval datasets.

pdf bib
PINGAN Omini-Sinitic at SemEval-2021 Task 4:Reading Comprehension of Abstract Meaning
Ye Wang | Yanmeng Wang | Haijun Zhu | Bo Zeng | Zhenghong Hao | Shaojun Wang | Jing Xiao
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper describes the winning system for subtask 2 and the second-placed system for subtask 1 in SemEval 2021 Task 4: ReadingComprehension of Abstract Meaning. We propose to use pre-trianed Electra discriminator to choose the best abstract word from five candidates. An upper attention and auto denoising mechanism is introduced to process the long sequences. The experiment results demonstrate that this contribution greatly facilitatesthe contextual language modeling in reading comprehension task. The ablation study is also conducted to show the validity of our proposed methods.