HwTscSU’s Submissions on WAT 2022 Shared Task

Yilun Liu, Zhen Zhang, Shimin Tao, Junhui Li, Hao Yang


Abstract
In this paper we describe our submission to the shared tasks of the 9th Workshop on Asian Translation (WAT 2022) on NICT–SAP under the team name ”HwTscSU”. The tasks involve translation from 5 languages into English and vice-versa in two domains: IT domain and Wikinews domain. The purpose is to determine the feasibility of multilingualism, domain adaptation or document-level knowledge given very little to none clean parallel corpora for training. Our approach for all translation tasks mainly focused on pre-training NMT models on general datasets and fine-tuning them on domain-specific datasets. Due to the small amount of parallel corpora, we collected and cleaned the OPUS dataset including three IT domain corpora, i.e., GNOME, KDE4, and Ubuntu. We then trained Transformer models on the collected dataset and fine-tuned on corresponding dev set. The BLEU scores greatly improved in comparison with other systems. Our submission ranked 1st in all IT-domain tasks and in one out of eight ALT domain tasks.
Anthology ID:
2022.wat-1.5
Volume:
Proceedings of the 9th Workshop on Asian Translation
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
WAT
SIG:
Publisher:
International Conference on Computational Linguistics
Note:
Pages:
59–63
Language:
URL:
https://aclanthology.org/2022.wat-1.5
DOI:
Bibkey:
Cite (ACL):
Yilun Liu, Zhen Zhang, Shimin Tao, Junhui Li, and Hao Yang. 2022. HwTscSU’s Submissions on WAT 2022 Shared Task. In Proceedings of the 9th Workshop on Asian Translation, pages 59–63, Gyeongju, Republic of Korea. International Conference on Computational Linguistics.
Cite (Informal):
HwTscSU’s Submissions on WAT 2022 Shared Task (Liu et al., WAT 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.wat-1.5.pdf