Abstract
This paper describes the Global Tone Communication Co., Ltd.’s submission of the WMT20 shared news translation task. We participate in four directions: English to (Khmer and Pashto) and (Khmer and Pashto) to English. Further, we get the best BLEU scores in the directions of English to Pashto, Pashto to English and Khmer to English (13.1, 23.1 and 25.5 respectively) among all the participants. Our submitted systems are unconstrained and focus on mBART (Multilingual Bidirectional and Auto-Regressive Transformers), back-translation and forward-translation. Also, we apply rules, language model and RoBERTa model to filter monolingual, parallel sentences and synthetic sentences. Besides, we validate the difference of the vocabulary built from monolingual data and parallel data.- Anthology ID:
- 2020.wmt-1.6
- Volume:
- Proceedings of the Fifth Conference on Machine Translation
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 100–104
- Language:
- URL:
- https://aclanthology.org/2020.wmt-1.6
- DOI:
- Cite (ACL):
- Chao Bei, Hao Zong, Qingmin Liu, and Conghu Yuan. 2020. GTCOM Neural Machine Translation Systems for WMT20. In Proceedings of the Fifth Conference on Machine Translation, pages 100–104, Online. Association for Computational Linguistics.
- Cite (Informal):
- GTCOM Neural Machine Translation Systems for WMT20 (Bei et al., WMT 2020)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2020.wmt-1.6.pdf
- Data
- FLoRes