Mingzhou Xu
2021
Document Graph for Neural Machine Translation
Mingzhou Xu
|
Liangyou Li
|
Derek F. Wong
|
Qun Liu
|
Lidia S. Chao
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Previous works have shown that contextual information can improve the performance of neural machine translation (NMT). However, most existing document-level NMT methods failed to leverage contexts beyond a few set of previous sentences. How to make use of the whole document as global contexts is still a challenge. To address this issue, we hypothesize that a document can be represented as a graph that connects relevant contexts regardless of their distances. We employ several types of relations, including adjacency, syntactic dependency, lexical consistency, and coreference, to construct the document graph. Then, we incorporate both source and target graphs into the conventional Transformer architecture with graph convolutional networks. Experiments on various NMT benchmarks, including IWSLT English–French, Chinese-English, WMT English–German and Opensubtitle English–Russian, demonstrate that using document graphs can significantly improve the translation quality. Extensive analysis verifies that the document graph is beneficial for capturing discourse phenomena.
2019
Leveraging Local and Global Patterns for Self-Attention Networks
Mingzhou Xu
|
Derek F. Wong
|
Baosong Yang
|
Yue Zhang
|
Lidia S. Chao
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Self-attention networks have received increasing research attention. By default, the hidden states of each word are hierarchically calculated by attending to all words in the sentence, which assembles global information. However, several studies pointed out that taking all signals into account may lead to overlooking neighboring information (e.g. phrase pattern). To address this argument, we propose a hybrid attention mechanism to dynamically leverage both of the local and global information. Specifically, our approach uses a gating scalar for integrating both sources of the information, which is also convenient for quantifying their contributions. Experiments on various neural machine translation tasks demonstrate the effectiveness of the proposed method. The extensive analyses verify that the two types of contexts are complementary to each other, and our method gives highly effective improvements in their integration.
Search
Co-authors
- Derek F. Wong 2
- Lidia S. Chao 2
- Baosong Yang 1
- Yue Zhang 1
- Liangyou Li 1
- show all...
- Qun Liu 1