First Steps in Benchmarking Latvian in Large Language Models

Inguna Skadiņa; Bruno Bakanovs; Roberts Darģis

First Steps in Benchmarking Latvian in Large Language Models

Inguna Skadina, Bruno Bakanovs, Roberts Darģis

Abstract

The performance of multilingual large language models (LLMs) in low-resource languages, such as Latvian, has been under-explored. In this paper, we investigate the capabilities of several open and commercial LLMs in the Latvian language understanding tasks. We evaluate these models across several well-known benchmarks, such as the Choice of Plausible Alternatives (COPA) and Measuring Massive Multitask Language Understanding (MMLU), which were adapted into Latvian using machine translation. Our results highlight significant variability in model performance, emphasizing the challenges of extending LLMs to low-resource languages. We also analyze the effect of post-editing on machine-translated datasets, observing notable improvements in model accuracy, particularly with BERT-based architectures. We also assess open-source LLMs using the Belebele dataset, showcasing competitive performance from open-weight models when compared to proprietary systems. This study reveals key insights into the limitations of current LLMs in low-resource settings and provides datasets for future benchmarking efforts.

Anthology ID:: 2025.resourceful-1.22
Volume:: Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)
Month:: March
Year:: 2025
Address:: Tallinn, Estonia
Editors:: Špela Arhar Holdt, Nikolai Ilinykh, Barbara Scalvini, Micaella Bruton, Iben Nyholm Debess, Crina Madalina Tudor
Venues:: RESOURCEFUL | WS
SIG:
Publisher:: University of Tartu Library, Estonia
Note:
Pages:: 86–95
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.resourceful-1.22/
DOI:
Bibkey:
Cite (ACL):: Inguna Skadina, Bruno Bakanovs, and Roberts Darģis. 2025. First Steps in Benchmarking Latvian in Large Language Models. In Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025), pages 86–95, Tallinn, Estonia. University of Tartu Library, Estonia.
Cite (Informal):: First Steps in Benchmarking Latvian in Large Language Models (Skadina et al., RESOURCEFUL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.resourceful-1.22.pdf

PDF Cite Search Fix data