Yang Zhao


A Simple Yet Effective Corpus Construction Method for Chinese Sentence Compression
Yang Zhao | Hiroshi Kanayama | Issei Yoshida | Masayasu Muraoka | Akiko Aizawa
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Deletion-based sentence compression in the English language has made significant progress over the past few decades. However, there is a lack of large-scale and high-quality parallel corpus (i.e., (sentence, compression) pairs) for the Chinese language to train an efficient compression system. To remedy this shortcoming, we present a dependency-tree-based method to construct a Chinese corpus with 151k pairs of sentences and compression based on Chinese language-specific characteristics. Subsequently, we trained both extractive and generative neural compression models using the constructed corpus. The experimental results show that our compression model can generate high-quality compressed sentences on both automatic and human evaluation metrics compared with the baselines. The results of the faithfulness evaluation also indicated that the Chinese compression model trained on our constructed corpus can produce more faithful compressed sentences. Furthermore, a dataset with 1,000 pairs of sentences and ground truth compression was manually created for automatic evaluation, which, we believe, will benefit future research on Chinese sentence compression.

A Versatile Adaptive Curriculum Learning Framework for Task-oriented Dialogue Policy Learning
Yang Zhao | Hua Qin | Wang Zhenyu | Changxi Zhu | Shihan Wang
Findings of the Association for Computational Linguistics: NAACL 2022

Training a deep reinforcement learning-based dialogue policy with brute-force random sampling is costly. A new training paradigm was proposed to improve learning performance and efficiency by combining curriculum learning. However, attempts in the field of dialogue policy are very limited due to the lack of reliable evaluation of difficulty scores of dialogue tasks and the high sensitivity to the mode of progression through dialogue tasks. In this paper, we present a novel versatile adaptive curriculum learning (VACL) framework, which presents a substantial step toward applying automatic curriculum learning on dialogue policy tasks. It supports evaluating the difficulty of dialogue tasks only using the learning experiences of dialogue policy and skip-level selection according to their learning needs to maximize the learning efficiency. Moreover, an attractive feature of VACL is the construction of a generic, elastic global curriculum while training a good dialogue policy that could guide different dialogue policy learning without extra effort on re-training. The superiority and versatility of VACL are validated on three public dialogue datasets.

A Simple Yet Effective Hybrid Pre-trained Language Model for Unsupervised Sentence Acceptability Prediction
Yang Zhao | Issei Yoshida
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Sentence acceptability judgment assesses to what degree a sentence is acceptable to native speakers of the language. Most unsupervised prediction approaches rely on a language model to obtain the likelihood of a sentence that reflects acceptability. However, two problems exist: first, low-frequency words would have a significant negative impact on the sentence likelihood derived from the language model; second, when it comes to multiple domains, the language model needs to be trained on domain-specific text for domain adaptation. To address both problems, we propose a simple method that substitutes Part-of-Speech (POS) tags for low-frequency words in sentences used for continual training of masked language models. Experimental results show that our word-tag-hybrid BERT model brings improvement on both a sentence acceptability benchmark and a cross-domain sentence acceptability evaluation corpus. Furthermore, our annotated cross-domain sentence acceptability evaluation corpus would benefit future research.

Can We Really Trust Explanations? Evaluating the Stability of Feature Attribution Explanation Methods via Adversarial Attack
Yang Zhao | Zhang Yuanzhe | Jiang Zhongtao | Ju Yiming | Zhao Jun | Liu Kang
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“Explanations can increase the transparency of neural networks and make them more trustworthy. However, can we really trust explanations generated by the existing explanation methods? If the explanation methods are not stable enough, the credibility of the explanation will be greatly reduced. Previous studies seldom considered such an important issue. To this end, this paper proposes a new evaluation frame to evaluate the stability of current typical feature attribution explanation methods via textual adversarial attack. Our frame could generate adversarial examples with similar textual semantics. Such adversarial examples will make the original models have the same outputs, but make most current explanation methods deduce completely different explanations. Under this frame, we test five classical explanation methods and show their performance on several stability-related metrics. Experimental results show our evaluation is effective and could reveal the stability performance of existing explanation methods.”


Rethinking Sentiment Style Transfer
Ping Yu | Yang Zhao | Chunyuan Li | Changyou Chen
Findings of the Association for Computational Linguistics: EMNLP 2021

Though remarkable efforts have been made in non-parallel text style transfer, the evaluation system is unsatisfactory. It always evaluates over samples from only one checkpoint of the model and compares three metrics, i.e., transfer accuracy, BLEU score, and PPL score. In this paper, we argue the inappropriateness of both existing evaluation metrics and the evaluation method. Specifically, for evaluation metrics, we make a detailed analysis and comparison from three aspects: style transfer, content preservation, and naturalness; for the evaluation method, we reiterate the fallacy of picking one checkpoint for model comparison. As a result, we establish a robust evaluation method by examining the trade-off between style transfer and naturalness, and between content preservation and naturalness. Notably, we elaborate the human evaluation and automatically identify the inaccurate measurement of content preservation computed by the BLEU score. To overcome this issue, we propose a graph-based method to extract attribute content and attribute-independent content from input sentences in the YELP dataset and IMDB dataset. With the modified datasets, we design a new evaluation metric called “attribute hit” and propose an efficient regularization to leverage the attribute-dependent content and attribute-independent content as guiding signals. Experimental results have demonstrated the effectiveness of the proposed strategy.


CASIA’s System for IWSLT 2020 Open Domain Translation
Qian Wang | Yuchen Liu | Cong Ma | Yu Lu | Yining Wang | Long Zhou | Yang Zhao | Jiajun Zhang | Chengqing Zong
Proceedings of the 17th International Conference on Spoken Language Translation

This paper describes the CASIA’s system for the IWSLT 2020 open domain translation task. This year we participate in both Chinese→Japanese and Japanese→Chinese translation tasks. Our system is neural machine translation system based on Transformer model. We augment the training data with knowledge distillation and back translation to improve the translation performance. Domain data classification and weighted domain model ensemble are introduced to generate the final translation result. We compare and analyze the performance on development data with different model settings and different data processing techniques.

Deconstruct to Reconstruct a Configurable Evaluation Metric for Open-Domain Dialogue Systems
Vitou Phy | Yang Zhao | Akiko Aizawa
Proceedings of the 28th International Conference on Computational Linguistics

Many automatic evaluation metrics have been proposed to score the overall quality of a response in open-domain dialogue. Generally, the overall quality is comprised of various aspects, such as relevancy, specificity, and empathy, and the importance of each aspect differs according to the task. For instance, specificity is mandatory in a food-ordering dialogue task, whereas fluency is preferred in a language-teaching dialogue system. However, existing metrics are not designed to cope with such flexibility. For example, BLEU score fundamentally relies only on word overlapping, whereas BERTScore relies on semantic similarity between reference and candidate response. Thus, they are not guaranteed to capture the required aspects, i.e., specificity. To design a metric that is flexible to a task, we first propose making these qualities manageable by grouping them into three groups: understandability, sensibleness, and likability, where likability is a combination of qualities that are essential for a task. We also propose a simple method to composite metrics of each aspect to obtain a single metric called USL-H, which stands for Understandability, Sensibleness, and Likability in Hierarchy. We demonstrated that USL-H score achieves good correlations with human judgment and maintains its configurability towards different aspects and metrics.

Knowledge Graph Enhanced Neural Machine Translation via Multi-task Learning on Sub-entity Granularity
Yang Zhao | Lu Xiang | Junnan Zhu | Jiajun Zhang | Yu Zhou | Chengqing Zong
Proceedings of the 28th International Conference on Computational Linguistics

Previous studies combining knowledge graph (KG) with neural machine translation (NMT) have two problems: i) Knowledge under-utilization: they only focus on the entities that appear in both KG and training sentence pairs, making much knowledge in KG unable to be fully utilized. ii) Granularity mismatch: the current KG methods utilize the entity as the basic granularity, while NMT utilizes the sub-word as the granularity, making the KG different to be utilized in NMT. To alleviate above problems, we propose a multi-task learning method on sub-entity granularity. Specifically, we first split the entities in KG and sentence pairs into sub-entity granularity by using joint BPE. Then we utilize the multi-task learning to combine the machine translation task and knowledge reasoning task. The extensive experiments on various translation tasks have demonstrated that our method significantly outperforms the baseline models in both translation quality and handling the entities.

Q-learning with Language Model for Edit-based Unsupervised Summarization
Ryosuke Kohita | Akifumi Wachi | Yang Zhao | Ryuki Tachibana
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Unsupervised methods are promising for abstractive textsummarization in that the parallel corpora is not required. However, their performance is still far from being satisfied, therefore research on promising solutions is on-going. In this paper, we propose a new approach based on Q-learning with an edit-based summarization. The method combines two key modules to form an Editorial Agent and Language Model converter (EALM). The agent predicts edit actions (e.t., delete, keep, and replace), and then the LM converter deterministically generates a summary on the basis of the action signals. Q-learning is leveraged to train the agent to produce proper edit actions. Experimental results show that EALM delivered competitive performance compared with the previous encoder-decoder-based methods, even with truly zero paired data (i.e., no validation set). Defining the task as Q-learning enables us not only to develop a competitive method but also to make the latest techniques in reinforcement learning available for unsupervised summarization. We also conduct qualitative analysis, providing insights into future study on unsupervised summarizers.

Dynamic Context Selection for Document-level Neural Machine Translation via Reinforcement Learning
Xiaomian Kang | Yang Zhao | Jiajun Zhang | Chengqing Zong
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Document-level neural machine translation has yielded attractive improvements. However, majority of existing methods roughly use all context sentences in a fixed scope. They neglect the fact that different source sentences need different sizes of context. To address this problem, we propose an effective approach to select dynamic context so that the document-level translation model can utilize the more useful selected context sentences to produce better translations. Specifically, we introduce a selection module that is independent of the translation module to score each candidate context sentence. Then, we propose two strategies to explicitly select a variable number of context sentences and feed them into the translation module. We train the two modules end-to-end via reinforcement learning. A novel reward is proposed to encourage the selection and utilization of dynamic context sentences. Experiments demonstrate that our approach can select adaptive context sentences for different source sentences, and significantly improves the performance of document-level translation methods.


Unsupervised Rewriter for Multi-Sentence Compression
Yang Zhao | Xiaoyu Shen | Wei Bi | Akiko Aizawa
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Multi-sentence compression (MSC) aims to generate a grammatical but reduced compression from multiple input sentences while retaining their key information. Previous dominating approach for MSC is the extraction-based word graph approach. A few variants further leveraged lexical substitution to yield more abstractive compression. However, two limitations exist. First, the word graph approach that simply concatenates fragments from multiple sentences may yield non-fluent or ungrammatical compression. Second, lexical substitution is often inappropriate without the consideration of context information. To tackle the above-mentioned issues, we present a neural rewriter for multi-sentence compression that does not need any parallel corpus. Empirical studies have shown that our approach achieves comparable results upon automatic evaluation and improves the grammaticality of compression based on human evaluation. A parallel corpus with more than 140,000 (sentence group, compression) pairs is also constructed as a by-product for future research.

Improving Latent Alignment in Text Summarization by Generalizing the Pointer Generator
Xiaoyu Shen | Yang Zhao | Hui Su | Dietrich Klakow
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Pointer Generators have been the de facto standard for modern summarization systems. However, this architecture faces two major drawbacks: Firstly, the pointer is limited to copying the exact words while ignoring possible inflections or abstractions, which restricts its power of capturing richer latent alignment. Secondly, the copy mechanism results in a strong bias towards extractive generations, where most sentences are produced by simply copying from the source text. In this paper, we address these problems by allowing the model to “edit” pointed tokens instead of always hard copying them. The editing is performed by transforming the pointed word vector into a target space with a learned relation embedding. On three large-scale summarization dataset, we show the model is able to (1) capture more latent alignment relations than exact word matches, (2) improve word alignment accuracy, allowing for better model interpretation and controlling, (3) generate higher-quality summaries validated by both qualitative and quantitative evaluations and (4) bring more abstraction to the generated summaries.


Exploiting Pre-Ordering for Neural Machine Translation
Yang Zhao | Jiajun Zhang | Chengqing Zong
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Addressing Troublesome Words in Neural Machine Translation
Yang Zhao | Jiajun Zhang | Zhongjun He | Chengqing Zong | Hua Wu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

One of the weaknesses of Neural Machine Translation (NMT) is in handling lowfrequency and ambiguous words, which we refer as troublesome words. To address this problem, we propose a novel memoryenhanced NMT method. First, we investigate different strategies to define and detect the troublesome words. Then, a contextual memory is constructed to memorize which target words should be produced in what situations. Finally, we design a hybrid model to dynamically access the contextual memory so as to correctly translate the troublesome words. The extensive experiments on Chinese-to-English and English-to-German translation tasks demonstrate that our method significantly outperforms the strong baseline models in translation quality, especially in handling troublesome words.

A Language Model based Evaluator for Sentence Compression
Yang Zhao | Zhiyuan Luo | Akiko Aizawa
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We herein present a language-model-based evaluator for deletion-based sentence compression and view this task as a series of deletion-and-evaluation operations using the evaluator. More specifically, the evaluator is a syntactic neural language model that is first built by learning the syntactic and structural collocation among words. Subsequently, a series of trial-and-error deletion operations are conducted on the source sentences via a reinforcement learning framework to obtain the best target compression. An empirical study shows that the proposed model can effectively generate more readable compression, comparable or superior to several strong baselines. Furthermore, we introduce a 200-sentence test set for a large-scale dataset, setting a new baseline for the future research.


Towards Neural Machine Translation with Partially Aligned Corpora
Yining Wang | Yang Zhao | Jiajun Zhang | Chengqing Zong | Zhengshan Xue
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

While neural machine translation (NMT) has become the new paradigm, the parameter optimization requires large-scale parallel data which is scarce in many domains and language pairs. In this paper, we address a new translation scenario in which there only exists monolingual corpora and phrase pairs. We propose a new method towards translation with partially aligned sentence pairs which are derived from the phrase pairs and monolingual corpora. To make full use of the partially aligned corpora, we adapt the conventional NMT training method in two aspects. On one hand, different generation strategies are designed for aligned and unaligned target words. On the other hand, a different objective function is designed to model the partially aligned parts. The experiments demonstrate that our method can achieve a relatively good result in such a translation scenario, and tiny bitexts can boost translation quality to a large extent.

A Conditional Variational Framework for Dialog Generation
Xiaoyu Shen | Hui Su | Yanran Li | Wenjie Li | Shuzi Niu | Yang Zhao | Akiko Aizawa | Guoping Long
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Deep latent variable models have been shown to facilitate the response generation for open-domain dialog systems. However, these latent variables are highly randomized, leading to uncontrollable generated responses. In this paper, we propose a framework allowing conditional response generation based on specific attributes. These attributes can be either manually assigned or automatically detected. Moreover, the dialog states for both speakers are modeled separately in order to reflect personal features. We validate this framework on two different scenarios, where the attribute refers to genericness and sentiment states respectively. The experiment result testified the potential of our model, where meaningful responses can be generated in accordance with the specified attributes.