Yang Liu

刘洋; ICT, Tsinghua, Beijing Academy of Artificial Intelligence

Other people with similar names: Yang Janet Liu (Georgetown University; 刘洋), Yang Liu (May refer to several people), Yang Liu (3M Health Information Systems), Yang Liu (University of Helsinki), Yang Liu (National University of Defense Technology), Yang Liu (Edinburgh), Yang Liu (The Chinese University of Hong Kong (Shenzhen)), Yang Liu (刘扬; Ph.D Purdue; ICSI, Dallas, Facebook, Liulishuo, Amazon), Yang Liu (Microsoft Cognitive Services Research), Yang Liu (Peking University), Yang Liu (Samsung Research Center Beijing), Yang Liu (Tianjin University, China), Yang Liu (Univ. of Michigan, UC Santa Cruz), Yang Liu (Wilfrid Laurier University)


DirectQuote: A Dataset for Direct Quotation Extraction and Attribution in News Articles
Yuanchi Zhang | Yang Liu
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Quotation extraction and attribution are challenging tasks, aiming at determining the spans containing quotations and attributing each quotation to the original speaker. Applying this task to news data is highly related to fact-checking, media monitoring and news tracking. Direct quotations are more traceable and informative, and therefore of great significance among different types of quotations. Therefore, this paper introduces DirectQuote, a corpus containing 19,760 paragraphs and 10,279 direct quotations manually annotated from online news media. To the best of our knowledge, this is the largest and most complete corpus that focuses on direct quotations in news texts. We ensure that each speaker in the annotation can be linked to a specific named entity on Wikidata, benefiting various downstream tasks. In addition, for the first time, we propose several sequence labeling models as baseline methods to extract and attribute quotations simultaneously in an end-to-end manner.

pdf bib
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Yulan He | Heng Ji | Sujian Li | Yang Liu | Chua-Hui Chang
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Yulan He | Heng Ji | Sujian Li | Yang Liu | Chua-Hui Chang
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

MSP: Multi-Stage Prompting for Making Pre-trained Language Models Better Translators
Zhixing Tan | Xiangwen Zhang | Shuo Wang | Yang Liu
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Prompting has recently been shown as a promising approach for applying pre-trained language models to perform downstream tasks. We present Multi-Stage Prompting, a simple and automatic approach for leveraging pre-trained language models to translation tasks. To better mitigate the discrepancy between pre-training and translation, MSP divides the translation process via pre-trained language models into three separate stages: the encoding stage, the re-encoding stage, and the decoding stage. During each stage, we independently apply different continuous prompts for allowing pre-trained language models better shift to translation tasks. We conduct extensive experiments on three translation tasks. Experiments show that our method can significantly improve the translation performance of pre-trained language models.

Integrating Vectorized Lexical Constraints for Neural Machine Translation
Shuo Wang | Zhixing Tan | Yang Liu
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Lexically constrained neural machine translation (NMT), which controls the generation of NMT models with pre-specified constraints, is important in many practical scenarios. Due to the representation gap between discrete constraints and continuous vectors in NMT models, most existing works choose to construct synthetic data or modify the decoding algorithm to impose lexical constraints, treating the NMT model as a black box. In this work, we propose to open this black box by directly integrating the constraints into NMT models. Specifically, we vectorize source and target constraints into continuous keys and values, which can be utilized by the attention modules of NMT models. The proposed integration method is based on the assumption that the correspondence between keys and values in attention modules is naturally suitable for modeling constraint pairs. Experimental results show that our method consistently outperforms several representative baselines on four language pairs, demonstrating the superiority of integrating vectorized lexical constraints.

pdf bib
Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022
Yulan He | Heng Ji | Sujian Li | Yang Liu | Chua-Hui Chang
Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022

pdf bib
Proceedings of the 21st Chinese National Conference on Computational Linguistics
Maosong Sun (孙茂松) | Yang Liu (刘洋) | Wanxiang Che (车万翔) | Yang Feng (冯洋) | Xipeng Qiu (邱锡鹏) | Gaoqi Rao (饶高琦) | Yubo Chen (陈玉博)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

Context-Situated Pun Generation
Jiao Sun | Anjali Narayan-Chen | Shereen Oraby | Shuyang Gao | Tagyoung Chung | Jing Huang | Yang Liu | Nanyun Peng
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Previous work on pun generation commonly begins with a given pun word (a pair of homophones for heterographic pun generation and a polyseme for homographic pun generation) and seeks to generate an appropriate pun. While this may enable efficient pun generation, we believe that a pun is most entertaining if it fits appropriately within a given context, e.g., a given situation or dialogue. In this work, we propose a new task, context-situated pun generation, where a specific context represented by a set of keywords is provided, and the task is to first identify suitable pun words that are appropriate for the context, then generate puns based on the context keywords and the identified pun words. We collect a new dataset, CUP (Context-sitUated Pun), containing 4.5k tuples of context words and pun pairs. Based on the new data and setup, we propose a pipeline system for context-situated pun generation, including a pun word retrieval module that identifies suitable pun words for a given context, and a pun generation module that generates puns from context keywords and pun words. Human evaluation shows that 69% of our top retrieved pun words can be used to generate context-situated puns, and our generation module yields successful puns 31% of the time given a plausible tuple of context words and pun pair, almost tripling the yield of a state-of-the-art pun generation model. With an end-to-end evaluation, our pipeline system with the top-1 retrieved pun pair for a given context can generate successful puns 40% of the time, better than all other modeling variations but 32% lower than the human success rate. This highlights the difficulty of the task, and encourages more research in this direction.

End-to-End Unsupervised Vision-and-Language Pre-training with Referring Expression Matching
Chi Chen | Peng Li | Maosong Sun | Yang Liu
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Recently there has been an emerging interest in unsupervised vision-and-language pre-training (VLP) that learns multimodal representations without parallel image-caption data. These pioneering works significantly reduce the cost of VLP on data collection and achieve promising results compared to supervised VLP. However, existing unsupervised VLP methods take as input pre-extracted region-based visual features from external object detectors, which both limits flexibility and reduces computational efficiency. In this paper, we explore end-to-end unsupervised VLP with a vision encoder to directly encode images. The vision encoder is pre-trained on image-only data and jointly optimized during multimodal pre-training. To further enhance the learned cross-modal features, we propose a novel pre-training task that predicts which patches contain an object referred to in natural language from the encoded visual features. Extensive experiments on four vision-and-language tasks show that our approach outperforms previous unsupervised VLP methods and obtains new state-of-the-art results.


pdf bib
Proceedings of the 20th Chinese National Conference on Computational Linguistics
Sheng Li (李生) | Maosong Sun (孙茂松) | Yang Liu (刘洋) | Hua Wu (吴华) | Kang Liu (刘康) | Wanxiang Che (车万翔) | Shizhu He (何世柱) | Gaoqi Rao (饶高琦)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

中美学者学术英语写作中词汇难度特征比较研究——以计算语言学领域论文为例(A Comparative Study of the Features of Lexical Sophistication in Academic English Writing by Chinese and American)
Yonghui Xie (谢永慧) | Yang Liu (刘洋) | Erhong Yang (杨尔弘) | Liner Yang (杨麟儿)
Proceedings of the 20th Chinese National Conference on Computational Linguistics


Mask-Align: Self-Supervised Neural Word Alignment
Chi Chen | Maosong Sun | Yang Liu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Word alignment, which aims to align translationally equivalent words between source and target sentences, plays an important role in many natural language processing tasks. Current unsupervised neural alignment methods focus on inducing alignments from neural machine translation models, which does not leverage the full context in the target sequence. In this paper, we propose Mask-Align, a self-supervised word alignment model that takes advantage of the full context on the target side. Our model masks out each target token and predicts it conditioned on both source and the remaining target tokens. This two-step process is based on the assumption that the source token contributing most to recovering the masked target token should be aligned. We also introduce an attention variant called leaky attention, which alleviates the problem of unexpected high cross-attention weights on special tokens such as periods. Experiments on four language pairs show that our model outperforms previous unsupervised neural aligners and obtains new state-of-the-art results.

Transfer Learning for Sequence Generation: from Single-source to Multi-source
Xuancheng Huang | Jingfang Xu | Maosong Sun | Yang Liu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Multi-source sequence generation (MSG) is an important kind of sequence generation tasks that takes multiple sources, including automatic post-editing, multi-source translation, multi-document summarization, etc. As MSG tasks suffer from the data scarcity problem and recent pretrained models have been proven to be effective for low-resource downstream tasks, transferring pretrained sequence-to-sequence models to MSG tasks is essential. Although directly finetuning pretrained models on MSG tasks and concatenating multiple sources into a single long sequence is regarded as a simple method to transfer pretrained models to MSG tasks, we conjecture that the direct finetuning method leads to catastrophic forgetting and solely relying on pretrained self-attention layers to capture cross-source information is not sufficient. Therefore, we propose a two-stage finetuning method to alleviate the pretrain-finetune discrepancy and introduce a novel MSG model with a fine encoder to learn better representations in MSG tasks. Experiments show that our approach achieves new state-of-the-art results on the WMT17 APE task and multi-source translation task using the WMT14 test set. When adapted to document-level translation, our framework outperforms strong baselines significantly.

Alternated Training with Synthetic and Authentic Data for Neural Machine Translation
Rui Jiao | Zonghan Yang | Maosong Sun | Yang Liu
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

On the Language Coverage Bias for Neural Machine Translation
Shuo Wang | Zhaopeng Tu | Zhixing Tan | Shuming Shi | Maosong Sun | Yang Liu
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

Knowledge Representation Learning with Contrastive Completion Coding
Bo Ouyang | Wenbing Huang | Runfa Chen | Zhixing Tan | Yang Liu | Maosong Sun | Jihong Zhu
Findings of the Association for Computational Linguistics: EMNLP 2021

Knowledge representation learning (KRL) has been used in plenty of knowledge-driven tasks. Despite fruitfully progress, existing methods still suffer from the immaturity on tackling potentially-imperfect knowledge graphs and highly-imbalanced positive-negative instances during training, both of which would hinder the performance of KRL. In this paper, we propose Contrastive Completion Coding (C3), a novel KRL framework that is composed of two functional components: 1. Hierarchical Architecture, which integrates both low-level standalone features and high-level topology-aware features to yield robust embedding for each entity/relation. 2. Normalized Contrasitive Training, which conducts normalized one-to-many contrasitive learning to emphasize different negatives with different weights, delivering better convergence compared to conventional training losses. Extensive experiments on several benchmarks verify the efficacy of the two proposed techniques and combing them together generally achieves superior performance against state-of-the-art approaches.

Segment, Mask, and Predict: Augmenting Chinese Word Segmentation with Self-Supervision
Mieradilijiang Maimaiti | Yang Liu | Yuanhang Zheng | Gang Chen | Kaiyu Huang | Ji Zhang | Huanbo Luan | Maosong Sun
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Recent state-of-the-art (SOTA) effective neural network methods and fine-tuning methods based on pre-trained models (PTM) have been used in Chinese word segmentation (CWS), and they achieve great results. However, previous works focus on training the models with the fixed corpus at every iteration. The intermediate generated information is also valuable. Besides, the robustness of the previous neural methods is limited by the large-scale annotated data. There are a few noises in the annotated corpus. Limited efforts have been made by previous studies to deal with such problems. In this work, we propose a self-supervised CWS approach with a straightforward and effective architecture. First, we train a word segmentation model and use it to generate the segmentation results. Then, we use a revised masked language model (MLM) to evaluate the quality of the segmentation results based on the predictions of the MLM. Finally, we leverage the evaluations to aid the training of the segmenter by improved minimum risk training. Experimental results show that our approach outperforms previous methods on 9 different CWS datasets with single criterion training and multiple criteria training and achieves better robustness.

Self-Supervised Quality Estimation for Machine Translation
Yuanhang Zheng | Zhixing Tan | Meng Zhang | Mieradilijiang Maimaiti | Huanbo Luan | Maosong Sun | Qun Liu | Yang Liu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Quality estimation (QE) of machine translation (MT) aims to evaluate the quality of machine-translated sentences without references and is important in practical applications of MT. Training QE models require massive parallel data with hand-crafted quality annotations, which are time-consuming and labor-intensive to obtain. To address the issue of the absence of annotated training data, previous studies attempt to develop unsupervised QE methods. However, very few of them can be applied to both sentence- and word-level QE tasks, and they may suffer from noises in the synthetic data. To reduce the negative impact of noises, we propose a self-supervised method for both sentence- and word-level QE, which performs quality estimation by recovering the masked target words. Experimental results show that our method outperforms previous unsupervised methods on several QE tasks in different language pairs and domains.


THUMT: An Open-Source Toolkit for Neural Machine Translation
Zhixing Tan | Jiacheng Zhang | Xuancheng Huang | Gang Chen | Shuo Wang | Maosong Sun | Huanbo Luan | Yang Liu
Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

Accurate Word Alignment Induction from Neural Machine Translation
Yun Chen | Yang Liu | Guanhua Chen | Xin Jiang | Qun Liu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Despite its original goal to jointly learn to align and translate, prior researches suggest that Transformer captures poor word alignments through its attention mechanism. In this paper, we show that attention weights do capture accurate word alignments and propose two novel word alignment induction methods Shift-Att and Shift-AET. The main idea is to induce alignments at the step when the to-be-aligned target token is the decoder input rather than the decoder output as in previous work. Shift-Att is an interpretation method that induces alignments from the attention weights of Transformer and does not require parameter update or architecture change. Shift-AET extracts alignments from an additional alignment module which is tightly integrated into Transformer and trained in isolation with supervision from symmetrized Shift-Att alignments. Experiments on three publicly available datasets demonstrate that both methods perform better than their corresponding neural baselines and Shift-AET significantly outperforms GIZA++ by 1.4-4.8 AER points.

pdf bib
Proceedings of the First Workshop on Automatic Simultaneous Translation
Hua Wu | Colin Cherry | Liang Huang | Zhongjun He | Mark Liberman | James Cross | Yang Liu
Proceedings of the First Workshop on Automatic Simultaneous Translation

pdf bib
Proceedings of the 19th Chinese National Conference on Computational Linguistics
Maosong Sun (孙茂松) | Sujian Li (李素建) | Yue Zhang (张岳) | Yang Liu (刘洋)
Proceedings of the 19th Chinese National Conference on Computational Linguistics


Shared-Private Bilingual Word Embeddings for Neural Machine Translation
Xuebo Liu | Derek F. Wong | Yang Liu | Lidia S. Chao | Tong Xiao | Jingbo Zhu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Word embedding is central to neural machine translation (NMT), which has attracted intensive research interest in recent years. In NMT, the source embedding plays the role of the entrance while the target embedding acts as the terminal. These layers occupy most of the model parameters for representation learning. Furthermore, they indirectly interface via a soft-attention mechanism, which makes them comparatively isolated. In this paper, we propose shared-private bilingual word embeddings, which give a closer relationship between the source and target embeddings, and which also reduce the number of model parameters. For similar source and target words, their embeddings tend to share a part of the features and they cooperatively learn these common representation units. Experiments on 5 language pairs belonging to 6 different language families and written in 5 different alphabets demonstrate that the proposed model provides a significant performance boost over the strong baselines with dramatically fewer model parameters.

Reducing Word Omission Errors in Neural Machine Translation: A Contrastive Learning Approach
Zonghan Yang | Yong Cheng | Yang Liu | Maosong Sun
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

While neural machine translation (NMT) has achieved remarkable success, NMT systems are prone to make word omission errors. In this work, we propose a contrastive learning approach to reducing word omission errors in NMT. The basic idea is to enable the NMT model to assign a higher probability to a ground-truth translation and a lower probability to an erroneous translation, which is automatically constructed from the ground-truth translation by omitting words. We design different types of negative examples depending on the number of omitted words, word frequency, and part of speech. Experiments on Chinese-to-English, German-to-English, and Russian-to-English translation tasks show that our approach is effective in reducing word omission errors and achieves better translation performance than three baseline methods.


Error Analysis of Uyghur Name Tagging: Language-specific Techniques and Remaining Challenges
Halidanmu Abudukelimu | Abudoukelimu Abulizi | Boliang Zhang | Xiaoman Pan | Di Lu | Heng Ji | Yang Liu
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Neural Network Methods for Natural Language Processing by Yoav Goldberg
Yang Liu | Meng Zhang
Computational Linguistics, Volume 44, Issue 1 - April 2018

Towards Robust Neural Machine Translation
Yong Cheng | Zhaopeng Tu | Fandong Meng | Junjie Zhai | Yang Liu
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Small perturbations in the input can severely distort intermediate representations and thus impact translation quality of neural machine translation (NMT) models. In this paper, we propose to improve the robustness of NMT models with adversarial stability training. The basic idea is to make both the encoder and decoder in NMT models robust against input perturbations by enabling them to behave similarly for the original input and its perturbed counterpart. Experimental results on Chinese-English, English-German and English-French translation tasks show that our approaches can not only achieve significant improvements over strong NMT systems but also improve the robustness of NMT models.

Multi-Domain Neural Machine Translation with Word-Level Domain Context Discrimination
Jiali Zeng | Jinsong Su | Huating Wen | Yang Liu | Jun Xie | Yongjing Yin | Jianqiang Zhao
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

With great practical value, the study of Multi-domain Neural Machine Translation (NMT) mainly focuses on using mixed-domain parallel sentences to construct a unified model that allows translation to switch between different domains. Intuitively, words in a sentence are related to its domain to varying degrees, so that they will exert disparate impacts on the multi-domain NMT modeling. Based on this intuition, in this paper, we devote to distinguishing and exploiting word-level domain contexts for multi-domain NMT. To this end, we jointly model NMT with monolingual attention-based domain classification tasks and improve NMT as follows: 1) Based on the sentence representations produced by a domain classifier and an adversarial domain classifier, we generate two gating vectors and use them to construct domain-specific and domain-shared annotations, for later translation predictions via different attention models; 2) We utilize the attention weights derived from target-side domain classifier to adjust the weights of target words in the training objective, enabling domain-related words to have greater impacts during model training. Experimental results on Chinese-English and English-French multi-domain translation tasks demonstrate the effectiveness of the proposed model. Source codes of this paper are available on Github https://github.com/DeepLearnXMU/WDCNMT.

Improving the Transformer Translation Model with Document-Level Context
Jiacheng Zhang | Huanbo Luan | Maosong Sun | Feifei Zhai | Jingfang Xu | Min Zhang | Yang Liu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Although the Transformer translation model (Vaswani et al., 2017) has achieved state-of-the-art performance in a variety of translation tasks, how to use document-level context to deal with discourse phenomena problematic for Transformer still remains a challenge. In this work, we extend the Transformer model with a new context encoder to represent document-level context, which is then incorporated into the original encoder and decoder. As large-scale document-level parallel corpora are usually not available, we introduce a two-step training method to take full advantage of abundant sentence-level parallel corpora and limited document-level parallel corpora. Experiments on the NIST Chinese-English datasets and the IWSLT French-English datasets show that our approach improves over Transformer significantly.

Learning to Remember Translation History with a Continuous Cache
Zhaopeng Tu | Yang Liu | Shuming Shi | Tong Zhang
Transactions of the Association for Computational Linguistics, Volume 6

Existing neural machine translation (NMT) models generally translate sentences in isolation, missing the opportunity to take advantage of document-level information. In this work, we propose to augment NMT models with a very light-weight cache-like memory network, which stores recent hidden representations as translation history. The probability distribution over generated words is updated online depending on the translation history retrieved from the memory, endowing NMT models with the capability to dynamically adapt over time. Experiments on multiple domains with different topics and styles show the effectiveness of the proposed approach with negligible impact on the computational cost.


Context Gates for Neural Machine Translation
Zhaopeng Tu | Yang Liu | Zhengdong Lu | Xiaohua Liu | Hang Li
Transactions of the Association for Computational Linguistics, Volume 5

In neural machine translation (NMT), generation of a target word depends on both source and target contexts. We find that source contexts have a direct impact on the adequacy of a translation while target contexts affect the fluency. Intuitively, generation of a content word should rely more on the source context and generation of a functional word should rely more on the target context. Due to the lack of effective control over the influence from source and target contexts, conventional NMT tends to yield fluent but inadequate translations. To address this problem, we propose context gates which dynamically control the ratios at which source and target contexts contribute to the generation of target words. In this way, we can enhance both the adequacy and fluency of NMT with more careful control of the information flow from contexts. Experiments show that our approach significantly improves upon a standard attention-based NMT system by +2.3 BLEU points.

Visualizing and Understanding Neural Machine Translation
Yanzhuo Ding | Yang Liu | Huanbo Luan | Maosong Sun
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

While neural machine translation (NMT) has made remarkable progress in recent years, it is hard to interpret its internal workings due to the continuous representations and non-linearity of neural networks. In this work, we propose to use layer-wise relevance propagation (LRP) to compute the contribution of each contextual word to arbitrary hidden states in the attention-based encoder-decoder framework. We show that visualization with LRP helps to interpret the internal workings of NMT and analyze translation errors.

Prior Knowledge Integration for Neural Machine Translation using Posterior Regularization
Jiacheng Zhang | Yang Liu | Huanbo Luan | Jingfang Xu | Maosong Sun
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Although neural machine translation has made significant progress recently, how to integrate multiple overlapping, arbitrary prior knowledge sources remains a challenge. In this work, we propose to use posterior regularization to provide a general framework for integrating prior knowledge into neural machine translation. We represent prior knowledge sources as features in a log-linear model, which guides the learning processing of the neural translation model. Experiments on Chinese-English dataset show that our approach leads to significant improvements.

A Teacher-Student Framework for Zero-Resource Neural Machine Translation
Yun Chen | Yang Liu | Yong Cheng | Victor O.K. Li
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

While end-to-end neural machine translation (NMT) has made remarkable progress recently, it still suffers from the data scarcity problem for low-resource language pairs and domains. In this paper, we propose a method for zero-resource NMT by assuming that parallel sentences have close probabilities of generating a sentence in a third language. Based on the assumption, our method is able to train a source-to-target NMT model (“student”) without parallel corpora available guided by an existing pivot-to-target NMT model (“teacher”) on a source-pivot parallel corpus. Experimental results show that the proposed method significantly improves over a baseline pivot-based model by +3.0 BLEU points across various language pairs.

Adversarial Training for Unsupervised Bilingual Lexicon Induction
Meng Zhang | Yang Liu | Huanbo Luan | Maosong Sun
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Word embeddings are well known to capture linguistic regularities of the language on which they are trained. Researchers also observe that these regularities can transfer across languages. However, previous endeavors to connect separate monolingual word embeddings typically require cross-lingual signals as supervision, either in the form of parallel corpus or seed lexicon. In this work, we show that such cross-lingual connection can actually be established without any form of supervision. We achieve this end by formulating the problem as a natural adversarial game, and investigating techniques that are crucial to successful training. We carry out evaluation on the unsupervised bilingual lexicon induction task. Even though this task appears intrinsically cross-lingual, we are able to demonstrate encouraging performance without any cross-lingual clues.

Earth Mover’s Distance Minimization for Unsupervised Bilingual Lexicon Induction
Meng Zhang | Yang Liu | Huanbo Luan | Maosong Sun
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Cross-lingual natural language processing hinges on the premise that there exists invariance across languages. At the word level, researchers have identified such invariance in the word embedding semantic spaces of different languages. However, in order to connect the separate spaces, cross-lingual supervision encoded in parallel data is typically required. In this paper, we attempt to establish the cross-lingual connection without relying on any cross-lingual supervision. By viewing word embedding spaces as distributions, we propose to minimize their earth mover’s distance, a measure of divergence between distributions. We demonstrate the success on the unsupervised bilingual lexicon induction task. In addition, we reveal an interesting finding that the earth mover’s distance shows potential as a measure of language difference.


Inducing Bilingual Lexica From Non-Parallel Data With Earth Mover’s Distance Regularization
Meng Zhang | Yang Liu | Huanbo Luan | Yiqun Liu | Maosong Sun
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Being able to induce word translations from non-parallel data is often a prerequisite for cross-lingual processing in resource-scarce languages and domains. Previous endeavors typically simplify this task by imposing the one-to-one translation assumption, which is too strong to hold for natural languages. We remove this constraint by introducing the Earth Mover’s Distance into the training of bilingual word embeddings. In this way, we take advantage of its capability to handle multiple alternative word translations in a natural form of regularization. Our approach shows significant and consistent improvements across four language pairs. We also demonstrate that our approach is particularly preferable in resource-scarce settings as it only requires a minimal seed lexicon.

Modeling Coverage for Neural Machine Translation
Zhaopeng Tu | Zhengdong Lu | Yang Liu | Xiaohua Liu | Hang Li
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Agreement-based Learning of Parallel Lexicons and Phrases from Non-Parallel Corpora
Chunyang Liu | Yang Liu | Maosong Sun | Huanbo Luan | Heng Yu
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Minimum Risk Training for Neural Machine Translation
Shiqi Shen | Yong Cheng | Zhongjun He | Wei He | Hua Wu | Maosong Sun | Yang Liu
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Semi-Supervised Learning for Neural Machine Translation
Yong Cheng | Wei Xu | Zhongjun He | Wei He | Hua Wu | Maosong Sun | Yang Liu
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)


Consistency-Aware Search for Word Alignment
Shiqi Shen | Yang Liu | Maosong Sun | Huanbo Luan
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

Bilingual Correspondence Recursive Autoencoder for Statistical Machine Translation
Jinsong Su | Deyi Xiong | Biao Zhang | Yang Liu | Junfeng Yao | Min Zhang
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

Generalized Agreement for Bidirectional Word Alignment
Chunyang Liu | Yang Liu | Maosong Sun | Huanbo Luan | Heng Yu
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

A Context-Aware Topic Model for Statistical Machine Translation
Jinsong Su | Deyi Xiong | Yang Liu | Xianpei Han | Hongyu Lin | Junfeng Yao | Min Zhang
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Learning Cross-lingual Word Embeddings via Matrix Co-factorization
Tianze Shi | Zhiyuan Liu | Yang Liu | Maosong Sun
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)


Large-scale CCG Induction from the Groningen Meaning Bank
Sebastian Beschke | Yang Liu | Wolfgang Menzel
Proceedings of the ACL 2014 Workshop on Semantic Parsing

pdf bib
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: Tutorials
Alex Fraser | Yang Liu
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: Tutorials

A Neural Reordering Model for Phrase-based Translation
Peng Li | Yang Liu | Maosong Sun | Tatsuya Izuha | Dakun Zhang
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

Query Lattice for Translation Retrieval
Meiping Dong | Yong Cheng | Yang Liu | Jia Xu | Maosong Sun | Tatsuya Izuha | Jie Hao
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers


pdf bib
A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
Yang Liu
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Statistical Machine Translation Improves Question Retrieval in Community Question Answering via Matrix Factorization
Guangyou Zhou | Fang Liu | Yang Liu | Shizhu He | Jun Zhao
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recursive Autoencoders for ITG-Based Translation
Peng Li | Yang Liu | Maosong Sun
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing


Left-to-Right Tree-to-String Decoding with Prediction
Yang Feng | Yang Liu | Qun Liu | Trevor Cohn
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Unsupervised Discriminative Induction of Synchronous Grammar for Machine Translation
Xinyan Xiao | Deyi Xiong | Yang Liu | Qun Liu | Shouxun Lin
Proceedings of COLING 2012

A Beam Search Algorithm for ITG Word Alignment
Peng Li | Yang Liu | Maosong Sun
Proceedings of COLING 2012: Posters

Combining Multiple Alignments to Improve Machine Translation
Zhaopeng Tu | Yang Liu | Yifan He | Josef van Genabith | Qun Liu | Shouxun Lin
Proceedings of COLING 2012: Posters

THUTR: A Translation Retrieval System
Chunyang Liu | Qi Liu | Yang Liu | Maosong Sun
Proceedings of COLING 2012: Demonstration Papers


Extracting Hierarchical Rules from a Weighted Alignment Matrix
Zhaopeng Tu | Yang Liu | Qun Liu | Shouxun Lin
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Maximum Rank Correlation Training for Statistical Machine Translation
Daqi Zheng | Yifan He | Yang Liu | Qun Liu
Proceedings of Machine Translation Summit XIII: Papers

ETS: An Error Tolerable System for Coreference Resolution
Hao Xiong | Linfeng Song | Fandong Meng | Yang Liu | Qun Liu | Yajuan Lv
Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task

Adjoining Tree-to-String Translation
Yang Liu | Qun Liu | Yajuan Lü
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training
Xinyan Xiao | Yang Liu | Qun Liu | Shouxun Lin
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing


pdf bib
Discriminative Word Alignment by Linear Modeling
Yang Liu | Qun Liu | Shouxun Lin
Computational Linguistics, Volume 36, Issue 3 - September 2010

Learning Lexicalized Reordering Models from Reordering Graphs
Jinsong Su | Yang Liu | Yajuan Lv | Haitao Mi | Qun Liu
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
Tree-Based and Forest-Based Translation
Yang Liu | Liang Huang
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

Joint Parsing and Translation
Yang Liu | Qun Liu
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

Dependency Forest for Statistical Machine Translation
Zhaopeng Tu | Yang Liu | Young-Sook Hwang | Qun Liu | Shouxun Lin
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

Joint Tokenization and Translation
Xinyan Xiao | Yang Liu | Young-Sook Hwang | Qun Liu | Shouxun Lin
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

An Efficient Shift-Reduce Decoding Algorithm for Phrased-Based Machine Translation
Yang Feng | Haitao Mi | Yang Liu | Qun Liu
Coling 2010: Posters

Effective Constituent Projection across Languages
Wenbin Jiang | Yajuan Lv | Yang Liu | Qun Liu
Coling 2010: Posters

Dependency-Based Bracketing Transduction Grammar for Statistical Machine Translation
Jinsong Su | Yang Liu | Haitao Mi | Hongmei Zhao | Yajuan Lv | Qun Liu
Coling 2010: Posters

The ICT statistical machine translation system for IWSLT 2010
Hao Xiong | Jun Xie | Hui Yu | Kai Liu | Wei Luo | Haitao Mi | Yang Liu | Yajuan Lü | Qun Liu
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign

Statistical Translation Model Based On Source Syntax Structure
Qun Liu | Yang Liu | Haitao Mi
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation


Weighted Alignment Matrices for Statistical Machine Translation
Yang Liu | Tian Xia | Xinyan Xiao | Qun Liu
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

Lattice-based System Combination for Statistical Machine Translation
Yang Feng | Yang Liu | Haitao Mi | Qun Liu | Yajuan Lü
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

Improving Tree-to-Tree Translation with Packed Forests
Yang Liu | Yajuan Lü | Qun Liu
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

Joint Decoding with Multiple Translation Models
Yang Liu | Haitao Mi | Yang Feng | Qun Liu
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

Sub-Sentence Division for Tree-Based Machine Translation
Hao Xiong | Wenwen Xu | Haitao Mi | Yang Liu | Qun Liu
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers


The ICT system description for IWSLT 2008.
Yang Liu | Zhongjun He | Haitao Mi | Yun Huang | Yang Feng | Wenbin Jiang | Yajuan Lu | Qun Liu
Proceedings of the 5th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper presents a description for the ICT systems involved in the IWSLT 2008 evaluation campaign. This year, we participated in Chinese-English and English-Chinese translation directions. Four statistical machine translation systems were used: one linguistically syntax-based, two formally syntax-based, and one phrase-based. The outputs of the four SMT systems were fed to a sentence-level system combiner, which was expected to produce better translations than single systems. We will report the results of the four single systems and the combiner on both the development and test sets.

Maximum Entropy based Rule Selection Model for Syntax-based Statistical Machine Translation
Qun Liu | Zhongjun He | Yang Liu | Shouxun Lin
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing


Forest-to-String Statistical Translation Rules
Yang Liu | Yun Huang | Qun Liu | Shouxun Lin
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

The ICT statistical machine translation systems for IWSLT 2007
Zhongjun He | Haitao Mi | Yang Liu | Deyi Xiong | Weihua Luo | Yun Huang | Zhixiang Ren | Yajuan Lu | Qun Liu
Proceedings of the Fourth International Workshop on Spoken Language Translation

In this paper, we give an overview of the ICT statistical machine translation systems for the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2007. In this year’s evaluation, we participated in the Chinese-English transcript translation task, and developed three systems based on different techniques: a formally syntax-based system Bruin, an extended phrase-based system Confucius and a linguistically syntax-based system Lynx. We will describe the models of these three systems, and compare their performance in detail. We set Bruin as our primary system, which ranks 2 among the 15 primary results according to the official evaluation results.


Tree-to-String Alignment Template for Statistical Machine Translation
Yang Liu | Qun Liu | Shouxun Lin
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics


Log-Linear Models for Word Alignment
Yang Liu | Qun Liu | Shouxun Lin
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)


Building a Bilingual WordNet-Like Lexicon: The New Approach and Algorithms
Yang Liu | Shiwen Yu | Jiangsheng Yu
COLING 2002: The 17th International Conference on Computational Linguistics: Project Notes
