How Well do LLMs know Finno-Ugric Languages? A Systematic Assessment

Hele-Andra Kuulmets; Taido Purason; Mark Fishel

How Well do LLMs know Finno-Ugric Languages? A Systematic Assessment

Hele-Andra Kuulmets, Taido Purason, Mark Fishel

Abstract

We present a systematic evaluation of multilingual capabilities of open large language models (LLMs), specifically focusing on five Finno-Ugric (FiU) languages. Our investigation covers multiple prompting strategies across several benchmarks and reveals that Llama-2 7B and Llama-2 13B perform weakly on most FiU languages. In contrast, Llama 3.1 models show impressive improvements, even for extremely low-resource languages such as Võro and Komi, indicating successful cross-lingual knowledge transfer inside the models. Finally, we show that stronger base models outperform weaker, language-adapted models, thus emphasizing the importance of base model in successful language adaptation.

Anthology ID:: 2025.nodalida-1.37
Volume:: Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)
Month:: march
Year:: 2025
Address:: Tallinn, Estonia
Editors:: Richard Johansson, Sara Stymne
Venue:: NoDaLiDa
SIG:
Publisher:: University of Tartu Library
Note:
Pages:: 340–353
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.nodalida-1.37/
DOI:
Bibkey:
Cite (ACL):: Hele-Andra Kuulmets, Taido Purason, and Mark Fishel. 2025. How Well do LLMs know Finno-Ugric Languages? A Systematic Assessment. In Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025), pages 340–353, Tallinn, Estonia. University of Tartu Library.
Cite (Informal):: How Well do LLMs know Finno-Ugric Languages? A Systematic Assessment (Kuulmets et al., NoDaLiDa 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.nodalida-1.37.pdf

PDF Cite Search Fix data