Manifold’s English-Chinese System at WMT22 General MT Task

Chang Jin (金畅); Tingxun Shi; Zhengshan Xue; Xiaodong Lin

Manifold’s English-Chinese System at WMT22 General MT Task

Chang Jin, Tingxun Shi, Zhengshan Xue, Xiaodong Lin

Abstract

Manifold’s English-Chinese System at WMT22 is an ensemble of 4 models trained by different configurations with scheduled sampling-based fine-tuning. The four configurations are DeepBig (XenC), DeepLarger (XenC), DeepBig-TalkingHeads (XenC) and DeepBig (LaBSE). Concretely, DeepBig extends Transformer-Big to 24 encoder layers. DeepLarger has 20 encoder layers and its feed-forward network (FFN) dimension is 8192. TalkingHeads applies the talking-heads trick. For XenC configs, we selected monolingual and parallel data that is similar to the past newstest datasets using XenC, and for LaBSE, we cleaned the officially provided parallel data using LaBSE pretrained model. According to the officially released autonomic metrics leaderboard, our final constrained system ranked 1st among all others when evaluated by bleu-all, chrf-all and COMET-B, 2nd by COMET-A.

Anthology ID:: 2022.wmt-1.20
Volume:: Proceedings of the Seventh Conference on Machine Translation (WMT)
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates (Hybrid)
Editors:: Philipp Koehn, Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Tom Kocmi, André Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri, Aurélie Névéol, Mariana Neves, Martin Popel, Marco Turchi, Marcos Zampieri
Venue:: WMT
SIG:: SIGMT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 275–279
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2022.wmt-1.20/
DOI:
Bibkey:
Cite (ACL):: Chang Jin, Tingxun Shi, Zhengshan Xue, and Xiaodong Lin. 2022. Manifold’s English-Chinese System at WMT22 General MT Task. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 275–279, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):: Manifold’s English-Chinese System at WMT22 General MT Task (Jin et al., WMT 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2022.wmt-1.20.pdf

PDF Cite Search Fix data