Samsung R&D Institute Poland Participation in WMT 2022

Adam Dobrowolski, Mateusz Klimaszewski, Adam Myśliwy, Marcin Szymański, Jakub Kowalski, Kornelia Szypuła, Paweł Przewłocki, Paweł Przybysz


Abstract
This paper presents the system description of Samsung R&D Institute Poland participation in WMT 2022 for General MT solution for medium and low resource languages: Russian and Croatian. Our approach combines iterative noised/tagged back-translation and iterative distillation. We investigated different monolingual resources and compared their influence on final translations. We used available BERT-likemodels for text classification and for extracting domains of texts. Then we prepared an ensemble of NMT models adapted to multiple domains. Finally we attempted to predict ensemble weight vectors from the BERT-based domain classifications for individual sentences. Our final trained models reached quality comparable to best online translators using only limited constrained resources during training.
Anthology ID:
2022.wmt-1.17
Volume:
Proceedings of the Seventh Conference on Machine Translation (WMT)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
251–259
Language:
URL:
https://aclanthology.org/2022.wmt-1.17
DOI:
Bibkey:
Cite (ACL):
Adam Dobrowolski, Mateusz Klimaszewski, Adam Myśliwy, Marcin Szymański, Jakub Kowalski, Kornelia Szypuła, Paweł Przewłocki, and Paweł Przybysz. 2022. Samsung R&D Institute Poland Participation in WMT 2022. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 251–259, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
Samsung R&D Institute Poland Participation in WMT 2022 (Dobrowolski et al., WMT 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.wmt-1.17.pdf