2024
pdf
abs
Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM Game
Pengyu Cheng
|
Yifan Yang
|
Jian Li
|
Yong Dai
|
Tianhao Hu
|
Peixin Cao
|
Nan Du
|
Xiaolong Li
Findings of the Association for Computational Linguistics: ACL 2024
Human preference alignment is essential to improve the interaction quality of large language models (LLMs). Existing alignment methods depend on manually annotated preference data to guide the LLM optimization directions. However, continuously updating LLMs for alignment raises a distribution gap between model-generated samples and human-annotated responses, hindering training effectiveness. To mitigate this issue, previous methods require additional preference annotation on newly generated samples to adapt to the shifted distribution, which consumes a large amount of annotation resources. Targeting more efficient human preference optimization, we propose an Adversarial Preference Optimization (APO) framework, in which the LLM and the reward model update alternatively via a min-max game. Through adversarial training, the reward model can adapt to the shifted generation distribution of the LLM without any additional annotation. With comprehensive experiments, we find the proposed adversarial training framework further enhances existing alignment baselines in terms of LLM helpfulness and harmlessness. The code is at https://github.com/Linear95/APO.
2020
pdf
abs
Interactive Question Clarification in Dialogue via Reinforcement Learning
Xiang Hu
|
Zujie Wen
|
Yafang Wang
|
Xiaolong Li
|
Gerard de Melo
Proceedings of the 28th International Conference on Computational Linguistics: Industry Track
Coping with ambiguous questions has been a perennial problem in real-world dialogue systems. Although clarification by asking questions is a common form of human interaction, it is hard to define appropriate questions to elicit more specific intents from a user. In this work, we propose a reinforcement model to clarify ambiguous questions by suggesting refinements of the original query. We first formulate a collection partitioning problem to select a set of labels enabling us to distinguish potential unambiguous intents. We list the chosen labels as intent phrases to the user for further confirmation. The selected label along with the original user query then serves as a refined query, for which a suitable response can more easily be identified. The model is trained using reinforcement learning with a deep policy network. We evaluate our model based on real-world user clicks and demonstrate significant improvements across several different experiments.
pdf
abs
Slot-consistent NLG for Task-oriented Dialogue Systems with Iterative Rectification Network
Yangming Li
|
Kaisheng Yao
|
Libo Qin
|
Wanxiang Che
|
Xiaolong Li
|
Ting Liu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Data-driven approaches using neural networks have achieved promising performances in natural language generation (NLG). However, neural generators are prone to make mistakes, e.g., neglecting an input slot value and generating a redundant slot value. Prior works refer this to hallucination phenomenon. In this paper, we study slot consistency for building reliable NLG systems with all slot values of input dialogue act (DA) properly generated in output sentences. We propose Iterative Rectification Network (IRN) for improving general NLG systems to produce both correct and fluent responses. It applies a bootstrapping algorithm to sample training candidates and uses reinforcement learning to incorporate discrete reward related to slot inconsistency into training. Comprehensive studies have been conducted on multiple benchmark datasets, showing that the proposed methods have significantly reduced the slot error rate (ERR) for all strong baselines. Human evaluations also have confirmed its effectiveness.
pdf
abs
Handling Rare Entities for Neural Sequence Labeling
Yangming Li
|
Han Li
|
Kaisheng Yao
|
Xiaolong Li
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
One great challenge in neural sequence labeling is the data sparsity problem for rare entity words and phrases. Most of test set entities appear only few times and are even unseen in training corpus, yielding large number of out-of-vocabulary (OOV) and low-frequency (LF) entities during evaluation. In this work, we propose approaches to address this problem. For OOV entities, we introduce local context reconstruction to implicitly incorporate contextual information into their representations. For LF entities, we present delexicalized entity identification to explicitly extract their frequency-agnostic and entity-type-specific representations. Extensive experiments on multiple benchmark datasets show that our model has significantly outperformed all previous methods and achieved new start-of-the-art results. Notably, our methods surpass the model fine-tuned on pre-trained language models without external resource.
2019
pdf
abs
Kingsoft’s Neural Machine Translation System for WMT19
Xinze Guo
|
Chang Liu
|
Xiaolong Li
|
Yiran Wang
|
Guoliang Li
|
Feng Wang
|
Zhitao Xu
|
Liuyi Yang
|
Li Ma
|
Changliang Li
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)
This paper describes the Kingsoft AI Lab’s submission to the WMT2019 news translation shared task. We participated in two language directions: English-Chinese and Chinese-English. For both language directions, we trained several variants of Transformer models using the provided parallel data enlarged with a large quantity of back-translated monolingual data. The best translation result was obtained with ensemble and reranking techniques. According to automatic metrics (BLEU) our Chinese-English system reached the second highest score, and our English-Chinese system reached the second highest score for this subtask.
2018
pdf
abs
Cross-Domain Review Helpfulness Prediction Based on Convolutional Neural Networks with Auxiliary Domain Discriminators
Cen Chen
|
Yinfei Yang
|
Jun Zhou
|
Xiaolong Li
|
Forrest Sheng Bao
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)
With the growing amount of reviews in e-commerce websites, it is critical to assess the helpfulness of reviews and recommend them accordingly to consumers. Recent studies on review helpfulness require plenty of labeled samples for each domain/category of interests. However, such an approach based on close-world assumption is not always practical, especially for domains with limited reviews or the “out-of-vocabulary” problem. Therefore, we propose a convolutional neural network (CNN) based model which leverages both word-level and character-based representations. To transfer knowledge between domains, we further extend our model to jointly model different domains with auxiliary domain discriminators. On the Amazon product review dataset, our approach significantly outperforms the state of the art in terms of both accuracy and cross-domain robustness.
2016
pdf
Reference Resolution in Situated Dialogue with Learned Semantics
Xiaolong Li
|
Kristy Boyer
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue
2015
pdf
Semantic Grounding in Dialogue for Complex Problem Solving
Xiaolong Li
|
Kristy Boyer
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
2010
pdf
An Overview of Microsoft Web N-gram Corpus and Applications
Kuansan Wang
|
Chris Thrasher
|
Evelyne Viegas
|
Xiaolong Li
|
Bo-june Paul Hsu
Proceedings of the NAACL HLT 2010 Demonstration Session
pdf
A Large Scale Ranker-Based System for Search Query Spelling Correction
Jianfeng Gao
|
Xiaolong Li
|
Daniel Micol
|
Chris Quirk
|
Xu Sun
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)