Kaisheng Yao

2021

pdf bib abs
Neural Sequence Segmentation as Determining the Leftmost Segments
Yangming Li | Lemao Liu | Kaisheng Yao
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Prior methods to text segmentation are mostly at token level. Despite the adequacy, this nature limits their full potential to capture the long-term dependencies among segments. In this work, we propose a novel framework that incrementally segments natural language sentences at segment level. For every step in segmentation, it recognizes the leftmost segment of the remaining sequence. Implementations involve LSTM-minus technique to construct the phrase representations and recurrent neural networks (RNN) to model the iterations of determining the leftmost segments. We have conducted extensive experiments on syntactic chunking and Chinese part-of-speech (POS) tagging across 3 datasets, demonstrating that our methods have significantly outperformed previous all baselines and achieved new state-of-the-art results. Moreover, qualitative analysis and the study on segmenting long-length sentences verify its effectiveness in modeling long-term dependencies.

pdf bib abs
Rewriter-Evaluator Architecture for Neural Machine Translation
Yangming Li | Kaisheng Yao
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

A few approaches have been developed to improve neural machine translation (NMT) models with multiple passes of decoding. However, their performance gains are limited because of lacking proper policies to terminate the multi-pass process. To address this issue, we introduce a novel architecture of Rewriter-Evaluator. Translating a source sentence involves multiple rewriting passes. In every pass, a rewriter generates a new translation to improve the past translation. Termination of this multi-pass process is determined by a score of translation quality estimated by an evaluator. We also propose prioritized gradient descent (PGD) to jointly and efficiently train the rewriter and the evaluator. Extensive experiments on three machine translation tasks show that our architecture notably improves the performances of NMT models and significantly outperforms prior methods. An oracle experiment reveals that it can largely reduce performance gaps to the oracle policy. Experiments confirm that the evaluator trained with PGD is more accurate than prior methods in determining proper numbers of rewriting.

2020

pdf bib abs
Slot-consistent NLG for Task-oriented Dialogue Systems with Iterative Rectification Network
Yangming Li | Kaisheng Yao | Libo Qin | Wanxiang Che | Xiaolong Li | Ting Liu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Data-driven approaches using neural networks have achieved promising performances in natural language generation (NLG). However, neural generators are prone to make mistakes, e.g., neglecting an input slot value and generating a redundant slot value. Prior works refer this to hallucination phenomenon. In this paper, we study slot consistency for building reliable NLG systems with all slot values of input dialogue act (DA) properly generated in output sentences. We propose Iterative Rectification Network (IRN) for improving general NLG systems to produce both correct and fluent responses. It applies a bootstrapping algorithm to sample training candidates and uses reinforcement learning to incorporate discrete reward related to slot inconsistency into training. Comprehensive studies have been conducted on multiple benchmark datasets, showing that the proposed methods have significantly reduced the slot error rate (ERR) for all strong baselines. Human evaluations also have confirmed its effectiveness.

pdf bib abs
Handling Rare Entities for Neural Sequence Labeling
Yangming Li | Han Li | Kaisheng Yao | Xiaolong Li
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

One great challenge in neural sequence labeling is the data sparsity problem for rare entity words and phrases. Most of test set entities appear only few times and are even unseen in training corpus, yielding large number of out-of-vocabulary (OOV) and low-frequency (LF) entities during evaluation. In this work, we propose approaches to address this problem. For OOV entities, we introduce local context reconstruction to implicitly incorporate contextual information into their representations. For LF entities, we present delexicalized entity identification to explicitly extract their frequency-agnostic and entity-type-specific representations. Extensive experiments on multiple benchmark datasets show that our model has significantly outperformed all previous methods and achieved new start-of-the-art results. Notably, our methods surpass the model fine-tuned on pre-trained language models without external resource.

Kaisheng Yao

2021

2020

2016

Co-authors

Venues