Improving Domain-Specific Translation from English into Ukrainian with Retrieval-Augmented Generation

Anton Shpigunov

Improving Domain-Specific Translation from English into Ukrainian with Retrieval-Augmented Generation

Abstract

Large language models have demonstrated competence as language translators, including for lower-resourced languages like Ukrainian. However, in specialized or novel domains, translation quality can suffer without adequate lexical and stylistic reference material. We present a retrieval-augmented approach to English-Ukrainian machine translation in a narrow domain: a private legal/military bilingual corpus. In this approach, semantically similar translation units retrieved via vector embeddings are provided as in-context examples to the LLM. We evaluate three open-weight Gemma 3 models, 4B, 12B, and 27B, against Gemini 3 Flash as a baseline across five augmentation conditions, with k values of 0, 3, 5, 10, and 25, on a 2,581-pair index and a 258-pair test set. We find that context augmentation yields statistically significant improvements in both ChrF++ and COMET for all models, with the smallest model’s COMET score improving by 0.076 at k = 3. However, smaller models exhibit context saturation: the 4B model’s performance peaks at k = 10 and degrades with additional context, losing 9.72 ChrF++ points and 0.007 COMET between k = 10 and k = 25, while larger models continue to benefit.

Anthology ID:: 2026.unlp-1.1
Volume:: Proceedings of the Fifth Ukrainian Natural Language Processing Conference (UNLP 2026)
Month:: May
Year:: 2026
Address:: Lviv, Ukraine
Editor:: Mariana Romanyshyn
Venue:: UNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1–11
Language:
URL:: https://preview.aclanthology.org/bulk-corrections-2026-07-02/2026.unlp-1.1/
DOI:
Bibkey:
Cite (ACL):: Anton Shpigunov. 2026. Improving Domain-Specific Translation from English into Ukrainian with Retrieval-Augmented Generation. In Proceedings of the Fifth Ukrainian Natural Language Processing Conference (UNLP 2026), pages 1–11, Lviv, Ukraine. Association for Computational Linguistics.
Cite (Informal):: Improving Domain-Specific Translation from English into Ukrainian with Retrieval-Augmented Generation (Shpigunov, UNLP 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/bulk-corrections-2026-07-02/2026.unlp-1.1.pdf

PDF Cite Search Fix data