Abstract
Advancements in Neural Machine Translation (NMT) greatly benefit the software localization industry by decreasing the post-editing time of human annotators. Although the volume of the software being localized is growing significantly, techniques for improving NMT for user interface (UI) texts are lacking. These UI texts have different properties than other collections of texts, presenting unique challenges for NMT. For example, they are often very short, causing them to be ambiguous and needing additional context (button, title text, a table item, etc.) for disambiguation. However, no such UI data sets are readily available with contextual information for NMT models to exploit. This work aims to provide a first step in improving UI translations and highlight its challenges. To achieve this, we provide a novel multilingual UI corpus collection (∼ 1.3M for English ↔ German) with a targeted test set and analyze the limitations of state-of-the-art methods on this challenging task. Specifically, we present a targeted test set for disambiguation from English to German to evaluate reliably and emphasize UI translation challenges. Furthermore, we evaluate several state-of-the-art NMT techniques from domain adaptation and document-level NMT on this challenging task. All the scripts to replicate the experiments and data sets are available here.ˆ,- Anthology ID:
- 2023.eacl-main.179
- Volume:
- Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
- Month:
- May
- Year:
- 2023
- Address:
- Dubrovnik, Croatia
- Venue:
- EACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2442–2454
- Language:
- URL:
- https://aclanthology.org/2023.eacl-main.179
- DOI:
- Cite (ACL):
- Sai Koneru, Matthias Huck, Miriam Exel, and Jan Niehues. 2023. Analyzing Challenges in Neural Machine Translation for Software Localization. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2442–2454, Dubrovnik, Croatia. Association for Computational Linguistics.
- Cite (Informal):
- Analyzing Challenges in Neural Machine Translation for Software Localization (Koneru et al., EACL 2023)
- PDF:
- https://preview.aclanthology.org/nodalida-main-page/2023.eacl-main.179.pdf