Chang Liu


Attend, Select and Eliminate: Accelerating Multi-turn Response Selection with Dual-attention-based Content Elimination
Jianxin Liang | Chang Liu | Chongyang Tao | Jiazhan Feng | Dongyan Zhao
Findings of the Association for Computational Linguistics: ACL 2023

Although the incorporation of pre-trained language models (PLMs) significantly pushes the research frontier of multi-turn response selection, it brings a new issue of heavy computation costs. To alleviate this problem and make the PLM-based response selection model both effective and efficient, we propose an inference framework together with a post-training strategy that builds upon any pre-trained transformer-based response selection models to accelerate inference by progressively selecting and eliminating unimportant content under the guidance of context-response dual-attention.Specifically, at each transformer layer, we first identify the importance of each word based on context-to-response and response-to-context attention, then select a number of unimportant words to be eliminated following a retention configuration derived from evolutionary search while passing the rest of the representations into deeper layers.To mitigate the training-inference gap posed by content elimination, we introduce a post-training strategy where we use knowledge distillation to force the model with progressively eliminated content to mimic the predictions of the original model with no content elimination.Experiments on three benchmarks indicate that our method can effectively speeds-up SOTA models without much performance degradation and shows a better trade-off between speed and performance than previous methods.

Path Spuriousness-aware Reinforcement Learning for Multi-Hop Knowledge Graph Reasoning
Chunyang Jiang | Tianchen Zhu | Haoyi Zhou | Chang Liu | Ting Deng | Chunming Hu | Jianxin Li
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Multi-hop reasoning, a prevalent approach for query answering, aims at inferring new facts along reasonable paths over a knowledge graph.Reinforcement learning methods can be adopted by formulating the problem into a Markov decision process.However, common suffering within RL-based reasoning models is that the agent can be biased to spurious paths which coincidentally lead to the correct answer with poor explanation.In this work, we take a deep dive into this phenomenon and define a metric named Path Spuriousness (PS), to quantitatively estimate to what extent a path is spurious.Guided by the definition of PS, we design a model with a new reward that considers both answer accuracy and path reasonableness.We test our method on four datasets and experiments reveal that our method considerably enhances the agent’s capacity to prevent spurious paths while keeping comparable to state-of-the-art performance.

CORE: Cooperative Training of Retriever-Reranker for Effective Dialogue Response Selection
Chongyang Tao | Jiazhan Feng | Tao Shen | Chang Liu | Juntao Li | Xiubo Geng | Daxin Jiang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Establishing retrieval-based dialogue systems that can select appropriate responses from the pre-built index has gained increasing attention. Recent common practice is to construct a two-stage pipeline with a fast retriever (e.g., bi-encoder) for first-stage recall followed by a smart response reranker (e.g., cross-encoder) for precise ranking. However, existing studies either optimize the retriever and reranker in independent ways, or distill the knowledge from a pre-trained reranker into the retriever in an asynchronous way, leading to sub-optimal performance of both modules. Thus, an open question remains about how to train them for a better combination of the best of both worlds. To this end, we present a cooperative training of the response retriever and the reranker whose parameters are dynamically optimized by the ground-truth labels as well as list-wise supervision signals from each other. As a result, the two modules can learn from each other and evolve together throughout the training. Experimental results on two benchmarks demonstrate the superiority of our method.

More than Classification: A Unified Framework for Event Temporal Relation Extraction
Quzhe Huang | Yutong Hu | Shengqi Zhu | Yansong Feng | Chang Liu | Dongyan Zhao
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Event temporal relation extraction (ETRE) is usually formulated as a multi-label classification task, where each type of relation is simply treated as a one-hot label. This formulation ignores the meaning of relations and wipes out their intrinsic dependency. After examining the relation definitions in various ETRE tasks, we observe that all relations can be interpreted using the start and end time points of events. For example, relation Includes could be interpreted as event 1 starting no later than event 2 and ending no earlier than event 2. In this paper, we propose a unified event temporal relation extraction framework, which transforms temporal relations into logical expressions of time points and completes the ETRE by predicting the relations between certain time point pairs. Experiments on TB-Dense and MATRES show significant improvements over a strong baseline and outperform the state-of-the-art model by 0.3% on both datasets. By representing all relations in a unified framework, we can leverage the relations with sufficient data to assist the learning of other relations, thus achieving stable improvement in low-data scenarios. When the relation definitions are changed, our method can quickly adapt to the new ones by simply modifying the logic expressions that map time points to new event relations. The code is released at


ProphetChat: Enhancing Dialogue Generation with Simulation of Future Conversation
Chang Liu | Xu Tan | Chongyang Tao | Zhenxin Fu | Dongyan Zhao | Tie-Yan Liu | Rui Yan
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Typical generative dialogue models utilize the dialogue history to generate the response. However, since one dialogue utterance can often be appropriately answered by multiple distinct responses, generating a desired response solely based on the historical information is not easy. Intuitively, if the chatbot can foresee in advance what the user would talk about (i.e., the dialogue future) after receiving its response, it could possibly provide a more informative response. Accordingly, we propose a novel dialogue generation framework named ProphetChat that utilizes the simulated dialogue futures in the inference phase to enhance response generation. To enable the chatbot to foresee the dialogue future, we design a beam-search-like roll-out strategy for dialogue future simulation using a typical dialogue generation model and a dialogue selector. With the simulated futures, we then utilize the ensemble of a history-to-response generator and a future-to-response generator to jointly generate a more informative response. Experiments on two popular open-domain dialogue datasets demonstrate that ProphetChat can generate better responses over strong baselines, which validates the advantages of incorporating the simulated dialogue futures.

Multi-Granularity Structural Knowledge Distillation for Language Model Compression
Chang Liu | Chongyang Tao | Jiazhan Feng | Dongyan Zhao
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Transferring the knowledge to a small model through distillation has raised great interest in recent years. Prevailing methods transfer the knowledge derived from mono-granularity language units (e.g., token-level or sample-level), which is not enough to represent the rich semantics of a text and may lose some vital knowledge. Besides, these methods form the knowledge as individual representations or their simple dependencies, neglecting abundant structural relations among intermediate representations. To overcome the problems, we present a novel knowledge distillation framework that gathers intermediate representations from multiple semantic granularities (e.g., tokens, spans and samples) and forms the knowledge as more sophisticated structural relations specified as the pair-wise interactions and the triplet-wise geometric angles based on multi-granularity representations. Moreover, we propose distilling the well-organized multi-granularity structural knowledge to the student hierarchically across layers. Experimental results on GLUE benchmark demonstrate that our method outperforms advanced distillation methods.

Reciprocal Learning of Knowledge Retriever and Response Ranker for Knowledge-Grounded Conversations
Jiazhan Feng | Chongyang Tao | Zhen Li | Chang Liu | Tao Shen | Dongyan Zhao
Proceedings of the 29th International Conference on Computational Linguistics

Grounding dialogue agents with knowledge documents has sparked increased attention in both academia and industry. Recently, a growing body of work is trying to build retrieval-based knowledge-grounded dialogue systems. While promising, these approaches require collecting pairs of dialogue context and the corresponding ground-truth knowledge sentences that contain the information regarding the dialogue context. Unfortunately, hand-labeling data to that end is time-consuming, and many datasets and applications lack such knowledge annotations. In this paper, we propose a reciprocal learning approach to jointly optimize a knowledge retriever and a response ranker for knowledge-grounded response retrieval without ground-truth knowledge labels. Specifically, the knowledge retriever uses the feedback from the response ranker as pseudo supervised signals of knowledge retrieval for updating its parameters, while the response ranker also receives the top-ranked knowledge sentences from knowledge retriever for optimization. Evaluation results on two public benchmarks show that our model can significantly outperform previous state-of-the-art methods.

Rethinking Task-Specific Knowledge Distillation: Contextualized Corpus as Better Textbook
Chang Liu | Chongyang Tao | Jianxin Liang | Tao Shen | Jiazhan Feng | Quzhe Huang | Dongyan Zhao
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Knowledge distillation has been proven effective when customizing small language models for specific tasks. Here, a corpus as ‘textbook’ plays an indispensable role, only through which the teacher can teach the student. Prevailing methods adopt a two-stage distillation paradigm: general distillation first with task-agnostic general corpus and task-specific distillation next with augmented task-specific corpus. We argue that such a paradigm may not be optimal. In general distillation, it’s extravagant to let the diverse but desultory general knowledge overwhelms the limited model capacity of the student. While in task-specific distillation, the task corpus is usually limited and narrow, preventing the student from learning enough knowledge. To mitigate the issues in the two gapped corpora, we present a better textbook for the student to learn: contextualized corpus that contextualizes task corpus with large-scale general corpus through relevance-based text retrieval. Experimental results on GLUE benchmark demonstrate that contextualized corpus is the better textbook compared with jointly using general corpus and augmented task-specific corpus. Surprisingly, it enables task-specific distillation from scratch without general distillation while maintaining comparable performance, making it more flexible to customize the student model with desired model size under various computation constraints.

SMASH: Improving SMAll Language Models’ Few-SHot Ability with Prompt-Based Distillation
Yueqian Wang | Chang Liu | Kai Chen | Xi Wang | Dongyan Zhao
Findings of the Association for Computational Linguistics: EMNLP 2022

Large-scale language models coupled with prompts have shown remarkable performance on few-shot learning. However, through systematic experiments, we find that the few-shot performance of small language models is poor, and using prompts on them brings fewer improvements than on larger ones. In this paper, we propose SMASH, an approach to improve SMAll language models’ few-SHot ability by training on intermediate tasks before prompt-based fine-tuning on downstream tasks. We design intermediate tasks for sentence-pair tasks and sentiment classification tasks by creating training examples with prompt templates similar to downstream tasks using sentences sampled from a large-scale unsupervised corpus, and apply knowledge distillation to distill from outputs of larger pre-trained models as the training objective. We conduct extensive experiments and show that SMASH can make a 6-layer DistilRoBRETa-base achieve comparable performance on few-shot datasets with a 12-layer RoBERTa-base at a low cost.

How to Represent Context Better? An Empirical Study on Context Modeling for Multi-turn Response Selection
Jiazhan Feng | Chongyang Tao | Chang Liu | Rui Yan | Dongyan Zhao
Findings of the Association for Computational Linguistics: EMNLP 2022

Building retrieval-based dialogue models that can predict appropriate responses based on the understanding of multi-turn context messages is a challenging problem. Early models usually concatenate all utterances or independently encode each dialogue turn, which may lead to an inadequate understanding of dialogue status. Although a few researchers have noticed the importance of context modeling in multi-turn response prediction, there is no systematic comparison to analyze how to model context effectively and no framework to unify those methods. In this paper, instead of configuring new architectures, we investigate how to improve existing models with a better context modeling method. Specifically, we heuristically summarize three categories of turn-aware context modeling strategies which model the context messages from the perspective of sequential relationship, local relationship, and query-aware manner respectively. A Turn-Aware Context Modeling (TACM) layer is explored to flexibly adapt and unify these context modeling strategies to several advanced response selection models. Evaluation results on three public data sets indicate that employing each individual context modeling strategy or multiple strategies can consistently improve the performance of existing models.


BioGen: Generating Biography Summary under Table Guidance on Wikipedia
Shen Gao | Xiuying Chen | Chang Liu | Dongyan Zhao | Rui Yan
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

字里行间的道德:中文文本道德句识别研究(Morality Between the Lines: Research on Identification of Chinese Moral Sentence)
Shiya Peng (彭诗雅) | Chang Liu (刘畅) | Yayue Deng (邓雅月) | Dong Yu (于东)
Proceedings of the 20th Chinese National Conference on Computational Linguistics



基于跨语言双语预训练及Bi-LSTM的汉-越平行句对抽取方法(Chinese-Vietnamese Parallel Sentence Pair Extraction Method Based on Cross-lingual Bilingual Pre-training and Bi-LSTM)
Chang Liu (刘畅) | Shengxiang Gao (高盛祥) | Zhengtao Yu (余正涛) | Yuxin Huang (黄于欣) | Congcong You (尤丛丛)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

汉越平行句对抽取是缓解汉越平行语料库数据稀缺的重要方法。平行句对抽取可转换为同一语义空间下的句子相似性分类任务,其核心在于双语语义空间对齐。传统语义空间对齐方法依赖于大规模的双语平行语料,越南语作为低资源语言获取大规模平行语料相对困难。针对这个问题本文提出一种利用种子词典进行跨语言双语预训练及Bi-LSTM(Bi-directional Long Short-Term Memory)的汉-越平行句对抽取方法。预训练中仅需要大量的汉越单语和一个汉越种子词典,通过利用汉越种子词典将汉越双语映射到公共语义空间进行词对齐。再利用Bi-LSTM和CNN(Convolutional Neural Networks)分别提取句子的全局特征和局部特征从而最大化表示汉-越句对之间的语义相关性。实验结果表明,本文模型在F1得分上提升7.1%,优于基线模型。

面向人工智能伦理计算的中文道德词典构建方法研究(Construction of a Chinese Moral Dictionary for Artificial Intelligence Ethical Computing)
Hongrui Wang (王弘睿) | Chang Liu (刘畅) | Dong Yu (于东)
Proceedings of the 19th Chinese National Conference on Computational Linguistics


BLCU-NLP at SemEval-2020 Task 5: Data Augmentation for Efficient Counterfactual Detecting
Chang Liu | Dong Yu
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Counterfactuals describe events counter to facts and hence naturally involve common sense, knowledge, and reasoning. SemEval 2020 task 5 is focusing on this field. We participate in the subtask 1 and we use BERT as our system. Our Innovations are feature extraction and data augmentation. We extract and summarize features of counterfactual statements, augment counterfactual examples in training set with the help of these features, and two general methods of data augmentation is experimented in our work. We demonstrate the effectiveness of our approaches, which achieves 0.95 of subtask 1 in F1 while using only a subset of giving training set to fine-tune the BERT model, and our official submission achieves F1 0.802, which ranks us 16th in the competition.


Kingsoft’s Neural Machine Translation System for WMT19
Xinze Guo | Chang Liu | Xiaolong Li | Yiran Wang | Guoliang Li | Feng Wang | Zhitao Xu | Liuyi Yang | Li Ma | Changliang Li
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

This paper describes the Kingsoft AI Lab’s submission to the WMT2019 news translation shared task. We participated in two language directions: English-Chinese and Chinese-English. For both language directions, we trained several variants of Transformer models using the provided parallel data enlarged with a large quantity of back-translated monolingual data. The best translation result was obtained with ensemble and reranking techniques. According to automatic metrics (BLEU) our Chinese-English system reached the second highest score, and our English-Chinese system reached the second highest score for this subtask.


Character-Level Machine Translation Evaluation for Languages with Ambiguous Word Boundaries
Chang Liu | Hwee Tou Ng
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Combining Coherence Models and Machine Translation Evaluation Metrics for Summarization Evaluation
Ziheng Lin | Chang Liu | Hwee Tou Ng | Min-Yen Kan
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)


Better Evaluation Metrics Lead to Better Machine Translation
Chang Liu | Daniel Dahlmeier | Hwee Tou Ng
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

TESLA at WMT 2011: Translation Evaluation and Tunable Metric
Daniel Dahlmeier | Chang Liu | Hwee Tou Ng
Proceedings of the Sixth Workshop on Statistical Machine Translation


TESLA: Translation Evaluation of Sentences with Linear-Programming-Based Analysis
Chang Liu | Daniel Dahlmeier | Hwee Tou Ng
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

PEM: A Paraphrase Evaluation Metric Exploiting Parallel Texts
Chang Liu | Daniel Dahlmeier | Hwee Tou Ng
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing


The NUS statistical machine translation system for IWSLT 2009
Preslav Nakov | Chang Liu | Wei Lu | Hwee Tou Ng
Proceedings of the 6th International Workshop on Spoken Language Translation: Evaluation Campaign

We describe the system developed by the team of the National University of Singapore for the Chinese-English BTEC task of the IWSLT 2009 evaluation campaign. We adopted a state-of-the-art phrase-based statistical machine translation approach and focused on experiments with different Chinese word segmentation standards. In our official submission, we trained a separate system for each segmenter and we combined the outputs in a subsequent re-ranking step. Given the small size of the training data, we further re-trained the system on the development data after tuning. The evaluation results show that both strategies yield sizeable and consistent improvements in translation quality.


Learning Predictive Structures for Semantic Role Labeling of NomBank
Chang Liu | Hwee Tou Ng
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics