Gianluca Barmina
2026
SommBench: Assessing Sommelier Expertise of Language Models
William Brach | Tomas Bedej | Jacob Nielsen | Jacob Pichna | Juraj Bedej | Eemeli Saarensilta | Julie Dupouy | Gianluca Barmina | Andrea Blasi Núñez | Peter Schneider-Kamp | Kristian Košťál | Michal Ries | Lukas Galke Poech
Proceedings of the Fifteenth Language Resources and Evaluation Conference
William Brach | Tomas Bedej | Jacob Nielsen | Jacob Pichna | Juraj Bedej | Eemeli Saarensilta | Julie Dupouy | Gianluca Barmina | Andrea Blasi Núñez | Peter Schneider-Kamp | Kristian Košťál | Michal Ries | Lukas Galke Poech
Proceedings of the Fifteenth Language Resources and Evaluation Conference
With the rapid advances of large language models, it becomes increasingly important to systematically evaluate their multilingual and multicultural capabilities. Previous cultural evaluation benchmarks focus mainly on basic cultural knowledge that can be encoded in linguistic form. Here, we propose SommBench, a multilingual benchmark to assess sommelier expertise, a domain deeply grounded in the senses of smell and taste. While language models learn about sensory properties exclusively through textual descriptions, SommBench tests whether this textual grounding is sufficient to emulate expert-level sensory judgment. SommBench comprises three main tasks: Wine Theory Question Answering (WTQA), Wine Feature Completion (WFC), and Food-Wine Pairing (FWP). SommBench is available in multiple languages: English, Slovak, Swedish, Finnish, German, Danish, Italian, and Spanish. This helps separate a language model’s wine expertise from its language skills. The benchmark datasets were developed in close collaboration with a professional sommelier and native speakers of the respective languages, resulting in 1,024 questions for wine theory question answering, 1,000 examples for wine feature completion, and 1,000 examples of food-wine pairing. We provide results for the most popular language models, including closed-weights models such as Gemini 2.5, and open-weights models, such as GPT-OSS and Qwen 3. Our results show that the most capable models perform well on wine theory question answering (up to 97% correct with a closed-weights model), yet feature completion (peaking at 65%) and food-wine pairing show (MCC ranging between 0 and 0.39) turn out to be more challenging. These results position SommBench as an interesting and challenging benchmark for evaluating the sommelier expertise of language models. The benchmark is publicly available at https://github.com/sommify/sommbench.
DaLA: Danish Linguistic Acceptability Evaluation Guided by Real World Errors
Gianluca Barmina | Nathalie Carmen Hau Norman | Peter Schneider-Kamp | Lukas Galke Poech
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Gianluca Barmina | Nathalie Carmen Hau Norman | Peter Schneider-Kamp | Lukas Galke Poech
Proceedings of the Fifteenth Language Resources and Evaluation Conference
We present an enhanced benchmark for evaluating linguistic acceptability in Danish. We first analyze the most common errors found in written Danish. Based on this analysis, we introduce a set of fourteen corruption functions that generate incorrect sentences by systematically introducing errors into existing correct Danish sentences. To ensure the accuracy of these corruptions, we assess their validity using both manual and automatic methods. The results are then used as a benchmark for evaluating Large Language Models on a linguistic acceptability judgement task. Our findings demonstrate that this extension is both broader and more comprehensive than the current state of the art. By incorporating a greater variety of corruption types, our benchmark provides a more rigorous assessment of linguistic acceptability, increasing task difficulty, as evidenced by the lower performance of LLMs on our benchmark compared to existing ones. Our results also suggest that our benchmark has a higher discriminatory power which allows to better distinguish well-performing models from low-performing ones.
Dynaword: From One-shot to Continuously Developed Datasets
Kenneth Enevoldsen | Kristian Nørgaard Jensen | Jan Kostkan | Balázs Szabó | Márton Kardos | Kirsten Vad | Johan Heinsen | Andrea Blasi Núñez | Gianluca Barmina | Jacob Nielsen | Rasmus Larsen | Rob van der Goot | Peter Vahlstrup | Per Møldrup Dalum | Desmond Elliott | Lukas Galke Poech | Peter Schneider-Kamp | Kristoffer Nielbo
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Kenneth Enevoldsen | Kristian Nørgaard Jensen | Jan Kostkan | Balázs Szabó | Márton Kardos | Kirsten Vad | Johan Heinsen | Andrea Blasi Núñez | Gianluca Barmina | Jacob Nielsen | Rasmus Larsen | Rob van der Goot | Peter Vahlstrup | Per Møldrup Dalum | Desmond Elliott | Lukas Galke Poech | Peter Schneider-Kamp | Kristoffer Nielbo
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Large-scale datasets are foundational for research and development in natural language processing. However, current approaches face three key challenges: (1) reliance on ambiguously licensed sources restricting use, sharing, and derivative works; (2) static dataset releases that prevent community contributions and diminish longevity; and (3) quality assurance processes restricted to publishing teams rather than leveraging community expertise. To address these limitations, we introduce two contributions: the Dynaword approach and Danish Dynaword. The Dynaword approach is a framework for creating large-scale, open datasets that can be continuously updated through community collaboration. Danish Dynaword is a concrete implementation that validates this approach and demonstrates its potential. Danish Dynaword contains over five times as many tokens as comparable releases, is exclusively openly licensed, and has received multiple contributions across industry, the public sector and research institutions. The repository includes light-weight tests to ensure data formatting, quality, and documentation, establishing a sustainable framework for ongoing community contributions and dataset evolution.
Search
Fix author
Co-authors
- Lukas Galke Poech 3
- Peter Schneider-Kamp 3
- Jacob Nielsen 2
- Andrea Blasi Núñez 2
- Tomas Bedej 1
- Juraj Bedej 1
- William Brach 1
- Per Møldrup Dalum 1
- Julie Dupouy 1
- Desmond Elliott 1
- Kenneth Enevoldsen 1
- Rob Van Der Goot 1
- Johan Heinsen 1
- Kristian Nørgaard Jensen 1
- Márton Kardos 1
- Jan Kostkan 1
- Kristian Košťál 1
- Rasmus Larsen 1
- Kristoffer Nielbo 1
- Nathalie Carmen Hau Norman 1
- Jacob Pichna 1
- Michal Ries 1
- Eemeli Saarensilta 1
- Balázs Szabó 1
- Kirsten Vad 1
- Peter Vahlstrup 1
Venues
- LREC3