Xiaohu Liu


2023

pdf
PersonaPKT: Building Personalized Dialogue Agents via Parameter-efficient Knowledge Transfer
Xu Han | Bin Guo | Yoon Jung | Benjamin Yao | Yu Zhang | Xiaohu Liu | Chenlei Guo
Proceedings of The Fourth Workshop on Simple and Efficient Natural Language Processing (SustaiNLP)

pdf
UseClean: learning from complex noisy labels in named entity recognition
Jinjin Tian | Kun Zhou | Meiguo Wang | Yu Zhang | Benjamin Yao | Xiaohu Liu | Chenlei Guo
Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD)

We investigate and refine denoising methods for NER task on data that potentially contains extremely noisy labels from multi-sources. In this paper, we first summarized all possible noise types and noise generation schemes, based on which we built a thorough evaluation system. We then pinpoint the bottleneck of current state-of-art denoising methods using our evaluation system. Correspondingly, we propose several refinements, including using a two-stage framework to avoid error accumulation; a novel confidence score utilizing minimal clean supervision to increase predictive power; an automatic cutoff fitting to save extensive hyper-parameter tuning; a warm started weighted partial CRF to better learn on the noisy tokens. Additionally, we propose to use adaptive sampling to further boost the performance in long-tailed entity settings. Our method improves F1 score by on average at least 5 10% over current state-of-art across extensive experiments.

2022

pdf
Overcoming Catastrophic Forgetting During Domain Adaptation of Seq2seq Language Generation
Dingcheng Li | Zheng Chen | Eunah Cho | Jie Hao | Xiaohu Liu | Fan Xing | Chenlei Guo | Yang Liu
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Seq2seq language generation models that are trained offline with multiple domains in a sequential fashion often suffer from catastrophic forgetting. Lifelong learning has been proposed to handle this problem. However, existing work such as experience replay or elastic weighted consolidation requires incremental memory space. In this work, we propose an innovative framework, RMR_DSEthat leverages a recall optimization mechanism to selectively memorize important parameters of previous tasks via regularization, and uses a domain drift estimation algorithm to compensate the drift between different do-mains in the embedding space. These designs enable the model to be trained on the current task while keep-ing the memory of previous tasks, and avoid much additional data storage. Furthermore, RMR_DSE can be combined with existing lifelong learning approaches. Our experiments on two seq2seq language generation tasks, paraphrase and dialog response generation, show thatRMR_DSE outperforms SOTA models by a considerable margin and reduces forgetting greatly.

pdf
Joint Goal Segmentation and Goal Success Prediction on Multi-Domain Conversations
Meiguo Wang | Benjamin Yao | Bin Guo | Xiaohu Liu | Yu Zhang | Tuan-Hung Pham | Chenlei Guo
Proceedings of the 29th International Conference on Computational Linguistics

To evaluate the performance of a multi-domain goal-oriented Dialogue System (DS), it is important to understand what the users’ goals are for the conversations and whether those goals are successfully achieved. The success rate of goals directly correlates with user satisfaction and perceived usefulness of the DS. In this paper, we propose a novel automatic dialogue evaluation framework that jointly performs two tasks: goal segmentation and goal success prediction. We extend the RoBERTa-IQ model (Gupta et al., 2021) by adding multi-task learning heads for goal segmentation and success prediction. Using an annotated dataset from a commercial DS, we demonstrate that our proposed model reaches an accuracy that is on-par with single-pass human annotation comparing to a three-pass gold annotation benchmark.

2016

pdf
Task Completion Platform: A self-serve multi-domain goal oriented dialogue platform
Paul Crook | Alex Marin | Vipul Agarwal | Khushboo Aggarwal | Tasos Anastasakos | Ravi Bikkula | Daniel Boies | Asli Celikyilmaz | Senthilkumar Chandramohan | Zhaleh Feizollahi | Roman Holenstein | Minwoo Jeong | Omar Khan | Young-Bum Kim | Elizabeth Krawczyk | Xiaohu Liu | Danko Panic | Vasiliy Radostev | Nikhil Ramesh | Jean-Phillipe Robichaud | Alexandre Rochette | Logan Stromberg | Ruhi Sarikaya
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

2015

pdf
Compact Lexicon Selection with Spectral Methods
Young-Bum Kim | Karl Stratos | Xiaohu Liu | Ruhi Sarikaya
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

1999

pdf
Mixed Language Query Disambiguation
Pascale Fung | Xiaohu Liu | Chi Shun Cheung
Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics