MUNI-NLP Submission for Czech-Ukrainian Translation Task at WMT23

Pavel Rychly, Yuliia Teslia


Abstract
The system is trained on officialy provided data only. We have heavily filtered all the data to remove machine translated text, Russian text and other noise. We use the DeepNorm modification of the transformer architecture in the TorchScale library with 18 encoder layers and 6 decoder layers. The initial systems for backtranslation uses HFT tokenizer, the final system uses custom tokenizer derived from HFT.
Anthology ID:
2023.wmt-1.14
Volume:
Proceedings of the Eighth Conference on Machine Translation
Month:
December
Year:
2023
Address:
Singapore
Editors:
Philipp Koehn, Barry Haddow, Tom Kocmi, Christof Monz
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
162–165
Language:
URL:
https://aclanthology.org/2023.wmt-1.14
DOI:
10.18653/v1/2023.wmt-1.14
Bibkey:
Cite (ACL):
Pavel Rychly and Yuliia Teslia. 2023. MUNI-NLP Submission for Czech-Ukrainian Translation Task at WMT23. In Proceedings of the Eighth Conference on Machine Translation, pages 162–165, Singapore. Association for Computational Linguistics.
Cite (Informal):
MUNI-NLP Submission for Czech-Ukrainian Translation Task at WMT23 (Rychly & Teslia, WMT 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2023.wmt-1.14.pdf