ItaEval: A CALAMITA Challenge
Giuseppe Attanasio, Moreno La Quatra, Andrea Santilli, Beatrice Savoldi
Abstract
In recent years, new language models for Italian have been spurring.However, evaluation methodologies for these models have not kept pace, remaining fragmented and often limited to the experimental sections of individual model releases. This paper introduces ItaEval, a multifaceted evaluation suite designed to address this gap. By reviewing recent literature on the evaluation of contemporary language models, we devise three overarching task categories—natural language understanding, commonsense and factual knowledge, and bias, fairness, and safety—that a contemporary model should be able to address. Next, we collect a set of 18 tasks encompassing existing and new datasets. The so-compiled ItaEval suite provides a standardized, multifaceted framework for evaluating Italian language models, facilitating more rigorous and comparative assessments of model performance. We release code and data at https://rita-nlp.org/sprints/itaeval.- Anthology ID:
- 2024.clicit-1.117
- Volume:
- Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
- Month:
- December
- Year:
- 2024
- Address:
- Pisa, Italy
- Editors:
- Felice Dell'Orletta, Alessandro Lenci, Simonetta Montemagni, Rachele Sprugnoli
- Venue:
- CLiC-it
- SIG:
- Publisher:
- CEUR Workshop Proceedings
- Note:
- Pages:
- 1064–1073
- Language:
- URL:
- https://preview.aclanthology.org/fix-sig-urls/2024.clicit-1.117/
- DOI:
- Cite (ACL):
- Giuseppe Attanasio, Moreno La Quatra, Andrea Santilli, and Beatrice Savoldi. 2024. ItaEval: A CALAMITA Challenge. In Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), pages 1064–1073, Pisa, Italy. CEUR Workshop Proceedings.
- Cite (Informal):
- ItaEval: A CALAMITA Challenge (Attanasio et al., CLiC-it 2024)
- PDF:
- https://preview.aclanthology.org/fix-sig-urls/2024.clicit-1.117.pdf