Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings

Weixuan Wang, Wei Peng, Meng Zhang, Qun Liu


Abstract
Neural Machine Translation (NMT) has shown a strong ability to utilize local context to disambiguate the meaning of words. However, it remains a challenge for NMT to leverage broader context information like topics. In this paper, we propose heterogeneous ways of embedding topic information at the sentence level into an NMT model to improve translation performance. Specifically, the topic information can be incorporated as pre-encoder topic embedding, post-encoder topic embedding, and decoder topic embedding to increase the likelihood of selecting target words from the same topic of the source sentence. Experimental results show that NMT models with the proposed topic knowledge embedding outperform the baselines on the English -> German and English -> French translation tasks.
Anthology ID:
2021.emnlp-main.256
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3197–3202
Language:
URL:
https://aclanthology.org/2021.emnlp-main.256
DOI:
10.18653/v1/2021.emnlp-main.256
Bibkey:
Cite (ACL):
Weixuan Wang, Wei Peng, Meng Zhang, and Qun Liu. 2021. Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3197–3202, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings (Wang et al., EMNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/remove-xml-comments/2021.emnlp-main.256.pdf
Video:
 https://preview.aclanthology.org/remove-xml-comments/2021.emnlp-main.256.mp4
Code
 Vicky-Wil/topic-NMT
Data
WMT 2014