Abstract
The system is trained on officialy provided data only. We have heavily filtered all the data to remove machine translated text, Russian text and other noise. We use the DeepNorm modification of the transformer architecture in the TorchScale library with 18 encoder layers and 6 decoder layers. The initial systems for backtranslation uses HFT tokenizer, the final system uses custom tokenizer derived from HFT.- Anthology ID:
- 2023.wmt-1.14
- Volume:
- Proceedings of the Eighth Conference on Machine Translation
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Philipp Koehn, Barry Haddow, Tom Kocmi, Christof Monz
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 162–165
- Language:
- URL:
- https://aclanthology.org/2023.wmt-1.14
- DOI:
- 10.18653/v1/2023.wmt-1.14
- Cite (ACL):
- Pavel Rychly and Yuliia Teslia. 2023. MUNI-NLP Submission for Czech-Ukrainian Translation Task at WMT23. In Proceedings of the Eighth Conference on Machine Translation, pages 162–165, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- MUNI-NLP Submission for Czech-Ukrainian Translation Task at WMT23 (Rychly & Teslia, WMT 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2023.wmt-1.14.pdf