Every Word Presented in Context: Syntactic Coverage as Objective for Low-Resource Machine Translation with Large Language Models

Samuel Frontull; Thomas Ströhle

Every Word Presented in Context: Syntactic Coverage as Objective for Low-Resource Machine Translation with Large Language Models

Abstract

Large Language Models (LLMs) have demonstrated strong capabilities in multilingual machine translation. However, they underperform for low-resource languages, indicating the need for more explicit instructional guidance. In this work, we introduce Fragment-Shot Prompting, a novel few-shot prompting method that aims to retrieve examples for every word occurring in the sentence to be translated, illustrating their use and meaning in context. We evaluate our method on translation between Italian, Ladin (Val Badia) and Ladin (Gherdëina) and compare its performance with zero-shot prompting, random few-shot prompting, as well as established lexical and semantic retrieval strategies. We conduct these experiments using state-of-the-art LLMs, including GPT-3.5, GPT-4o, o1-mini, LlaMA-3.3, and DeepSeek-R1. Our results demonstrate that LLMs can extract substantial value from limited data when translating from a low- to the high-resource language. However, this does not apply to translations into the low-resource languages, where the prompting method plays a much more important role. In particular, our method consistently delivers the best results and enables significant gains. Even though translation performance into Ladin remains limited with the available resources, our results highlight the importance of syntactic coverage for improving translation accuracy and ariant-specific adaptation in low-resource scenarios.

Anthology ID:: 2026.lrec-main.694
Volume:: Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:: May
Year:: 2026
Address:: Palma de Mallorca, Spain
Editors:: Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:: LREC
SIG:
Publisher:: ELRA Language Resource Association
Note:
Pages:: 8824–8837
Language:
URL:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.694/
DOI:
Bibkey:
Cite (ACL):: Samuel Frontull and Thomas Ströhle. 2026. Every Word Presented in Context: Syntactic Coverage as Objective for Low-Resource Machine Translation with Large Language Models. International Conference on Language Resources and Evaluation, main:8824–8837.
Cite (Informal):: Every Word Presented in Context: Syntactic Coverage as Objective for Low-Resource Machine Translation with Large Language Models (Frontull & Ströhle, LREC 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.694.pdf

PDF Cite Search Fix data