BaBIEs: A Benchmark for the Linguistic Evaluation of Italian Baby Language Models

Luca Capone, Alice Suozzi, Gianluca Lebani, Alessandro Lenci


Abstract
The possibility of comparing the linguistic competence of Language Models (LMs) to that of children has gained growing attention lately, raising the need for effective tools for evaluating both the former and the latter. To this purpose, we developed a resource for the linguistic evaluation of BabyLMs, which are LMs trained on datasets that comparable to the linguistic stimulus received by children. This resource adapts four standardized tests for the evaluation of linguistic skills of Italian-speaking children (BVL, TROG-2, TCGB-2 and Peabody). To verify the effectiveness of our benchmark, we administered it to Minerva, a LLM pretrained from scratch on Italian. Our results indicate that Minerva struggles to master certain linguistic aspects, achieving an age-equivalent score of 4 years, and that the type of task administered affects the model’s performance.
Anthology ID:
2024.clicit-1.20
Volume:
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
Month:
December
Year:
2024
Address:
Pisa, Italy
Editors:
Felice Dell'Orletta, Alessandro Lenci, Simonetta Montemagni, Rachele Sprugnoli
Venue:
CLiC-it
SIG:
Publisher:
CEUR Workshop Proceedings
Note:
Pages:
157–170
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2024.clicit-1.20/
DOI:
Bibkey:
Cite (ACL):
Luca Capone, Alice Suozzi, Gianluca Lebani, and Alessandro Lenci. 2024. BaBIEs: A Benchmark for the Linguistic Evaluation of Italian Baby Language Models. In Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), pages 157–170, Pisa, Italy. CEUR Workshop Proceedings.
Cite (Informal):
BaBIEs: A Benchmark for the Linguistic Evaluation of Italian Baby Language Models (Capone et al., CLiC-it 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2024.clicit-1.20.pdf