Hans Christian Farsethås


2025

pdf bib
NorEval: A Norwegian Language Understanding and Generation Evaluation Benchmark
Vladislav Mikhailov | Tita Enstad | David Samuel | Hans Christian Farsethås | Andrey Kutuzov | Erik Velldal | Lilja Øvrelid
Findings of the Association for Computational Linguistics: ACL 2025

This paper introduces NorEval, a new and comprehensive evaluation suite for large-scale standardized benchmarking of Norwegian generative language models (LMs). NorEval consists of 24 high-quality human-created datasets – of which five are created from scratch. In contrast to existing benchmarks for Norwegian, NorEval covers a broad spectrum of task categories targeting Norwegian language understanding and generation, establishes human baselines, and focuses on both of the official written standards of the Norwegian language: Bokmål and Nynorsk. All our datasets and a collection of over 100 human-created prompts are integrated into LM Evaluation Harness, ensuring flexible and reproducible evaluation. We describe the NorEval design and present the results of benchmarking 19 open-source pretrained and instruction-tuned LMs for Norwegian in various scenarios. Our benchmark, evaluation framework, and annotation materials are publicly available.

pdf bib
The Impact of Copyrighted Material on Large Language Models: A Norwegian Perspective
Javier de la Rosa | Vladislav Mikhailov | Lemei Zhang | Freddy Wetjen | David Samuel | Peng Liu | Rolv-Arild Braaten | Petter Mæhlum | Magnus Breder Birkenes | Andrey Kutuzov | Tita Enstad | Hans Christian Farsethås | Svein Arne Brygfjeld | Jon Atle Gulla | Stephan Oepen | Erik Velldal | Wilfred Østgulen | Lilja Øvrelid | Aslak Sira Myhre
Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)

The use of copyrighted materials in training language models raises critical legal and ethical questions. This paper presents a framework for and the results of empirically assessing the impact of publisher-controlled copyrighted corpora on the performance of generative large language models (LLMs) for Norwegian. When evaluated on a diverse set of tasks, we found that adding both books and newspapers to the data mixture of LLMs tend to improve their performance, while the addition of fiction works seems to be detrimental. Our experiments could inform the creation of a compensation scheme for authors whose works contribute to AI development.