Machine Translation of Folktales: small-data-driven and LLM-based approaches

Olena Burda-Lassen


Abstract
Can Large Language Models translate texts with rich cultural elements? How “cultured” are they? This paper provides an overview of an experiment in Machine Translation of Ukrainian folktales using Large Language Models (Open AI), Google Cloud Translation API, and Opus MT. After benchmarking their performance, we have fine-tuned an Opus MT model on a domain-specific small dataset specially created to translate folktales from Ukrainian to English. We have also tested various prompt engineering techniques on the new Open AI models to generate translations of our test dataset (folktale ‘The Mitten’) and have observed promising results. This research explores the importance of both small data and Large Language Models in Machine Learning, specifically in Machine Translation of literary texts, on the example of Ukrainian folktales.
Anthology ID:
2023.clasp-1.8
Volume:
Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD)
Month:
September
Year:
2023
Address:
Gothenburg, Sweden
Editors:
Ellen Breitholtz, Shalom Lappin, Sharid Loaiciga, Nikolai Ilinykh, Simon Dobnik
Venue:
CLASP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
68–71
Language:
URL:
https://aclanthology.org/2023.clasp-1.8
DOI:
Bibkey:
Cite (ACL):
Olena Burda-Lassen. 2023. Machine Translation of Folktales: small-data-driven and LLM-based approaches. In Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD), pages 68–71, Gothenburg, Sweden. Association for Computational Linguistics.
Cite (Informal):
Machine Translation of Folktales: small-data-driven and LLM-based approaches (Burda-Lassen, CLASP 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2023.clasp-1.8.pdf