María Grandury
2026
BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data
Jaap Jumelet | Abdellah Fourtassi | Akari Haga | Bastian Bunzeck | Bhargav Shandilya | Diana Galvan-Sosa | Faiz Ghifari Haznitrama | Francesca Padovani | Francois Meyer | Hai Hu | Julen Etxaniz | Laurent Prevot | Linyang He | María Grandury | Mila Marcheva | Negar Foroutan | Nikitas Theodoropoulos | Pouya Sadeghi | Siyuan Song | Suchir Salhan | Susana Zhou | Yurii Paniv | Ziyin Zhang | Arianna Bisazza | Alex Warstadt | Leshem Choshen
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Jaap Jumelet | Abdellah Fourtassi | Akari Haga | Bastian Bunzeck | Bhargav Shandilya | Diana Galvan-Sosa | Faiz Ghifari Haznitrama | Francesca Padovani | Francois Meyer | Hai Hu | Julen Etxaniz | Laurent Prevot | Linyang He | María Grandury | Mila Marcheva | Negar Foroutan | Nikitas Theodoropoulos | Pouya Sadeghi | Siyuan Song | Suchir Salhan | Susana Zhou | Yurii Paniv | Ziyin Zhang | Arianna Bisazza | Alex Warstadt | Leshem Choshen
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
We present BabyBabelLM, a multilingual collection of datasets modeling the language a person observes from birth until they acquire a native language. We curate developmentally plausible pretraining data aiming to cover the equivalent of 100M English words of content in each of 45 languages. We compile evaluation suites and train baseline models in each language. BabyBabelLM aims to facilitate multilingual pretraining and cognitive modeling.
Apertus: Democratizing Open and Compliant LLMs for Global Language Environments
Alejandro Hernández-Cano | Alexander Hägele | Allen Hao Huang | Angelika Romanou | Antoni-Joan Solergibert | Barna Pásztor | Bettina Messmer | Dhia Garbaya | Eduard Frank Ďurech | Ido Hakimi | Juan Garcia Giraldo | Mete Ismayilzada | Negar Foroutan | Skander Moalla | Tiancheng Chen | Vinko Sabolčec | Yixuan Xu | Michael Aerni | Badr AlKhamissi | Inés Altemir Marinas | Mohammad Hossein Amani | Matin Ansaripour | Ilia Badanin | Harold Benoit | Emanuela Boros | Nicholas John Browning | Fabian Bösch | Maximilian Böther | Niklas Canova | Camille Challier | Clément Charmillot | Jonathan Coles | Jan Milan Deriu | Arnout Devos | Lukas Drescher | Daniil Dzenhaliou | Maud Ehrmann | Dongyang Fan | Simin Fan | Silin Gao | Miguel Gila | María Grandury | Diba Hashemi | Alexander Miserlis Hoyle | Jiaming Jiang | Mark Klein | Andrei Kucharavy | Anastasiia Kucherenko | Frederike Lübeck | Roman Machacek | Theofilos Ioannis Manitaras | Andreas Marfurt | Kyle Matoba | Simon Matrenok | Henrique Mendonça | Fawzi Roberto Mohamed | Syrielle Montariol | Luca Mouchel | Sven Najem-Meyer | Jingwei Ni | Gennaro Oliva | Matteo Pagliardini | Elia Palme | Andrei Panferov | Léo Paoletti | Marco Passerini | Ivan Pavlov | Auguste Poiroux | Kaustubh Ponkshe | Nathan Ranchin | Javier Rando | Mathieu Sauser | Jakhongir Saydaliev | Mukhammadali Sayfiddinov | Marian Schneider | Stefano Schuppli | Marco Scialanga | Andrei Semenov | Kumar Shridhar | Raghav Singhal | Anna Sotnikova | Alexander Sternfeld | Ayush Kumar Tarun | Paul Teiletche | Jannis Vamvas | Xiaozhe Yao | Hao Zhao | Alexander Ilic | Ana Klimovic | Andreas Krause | Caglar Gulcehre | David Rosenthal | Elliott Ash | Florian Tramèr | Joost VandeVondele | Livio Veraldi | Martin Rajman | Thomas C. Schulthess | Torsten Hoefler | Antoine Bosselut | Martin Jaggi | Imanol Schlag
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Alejandro Hernández-Cano | Alexander Hägele | Allen Hao Huang | Angelika Romanou | Antoni-Joan Solergibert | Barna Pásztor | Bettina Messmer | Dhia Garbaya | Eduard Frank Ďurech | Ido Hakimi | Juan Garcia Giraldo | Mete Ismayilzada | Negar Foroutan | Skander Moalla | Tiancheng Chen | Vinko Sabolčec | Yixuan Xu | Michael Aerni | Badr AlKhamissi | Inés Altemir Marinas | Mohammad Hossein Amani | Matin Ansaripour | Ilia Badanin | Harold Benoit | Emanuela Boros | Nicholas John Browning | Fabian Bösch | Maximilian Böther | Niklas Canova | Camille Challier | Clément Charmillot | Jonathan Coles | Jan Milan Deriu | Arnout Devos | Lukas Drescher | Daniil Dzenhaliou | Maud Ehrmann | Dongyang Fan | Simin Fan | Silin Gao | Miguel Gila | María Grandury | Diba Hashemi | Alexander Miserlis Hoyle | Jiaming Jiang | Mark Klein | Andrei Kucharavy | Anastasiia Kucherenko | Frederike Lübeck | Roman Machacek | Theofilos Ioannis Manitaras | Andreas Marfurt | Kyle Matoba | Simon Matrenok | Henrique Mendonça | Fawzi Roberto Mohamed | Syrielle Montariol | Luca Mouchel | Sven Najem-Meyer | Jingwei Ni | Gennaro Oliva | Matteo Pagliardini | Elia Palme | Andrei Panferov | Léo Paoletti | Marco Passerini | Ivan Pavlov | Auguste Poiroux | Kaustubh Ponkshe | Nathan Ranchin | Javier Rando | Mathieu Sauser | Jakhongir Saydaliev | Mukhammadali Sayfiddinov | Marian Schneider | Stefano Schuppli | Marco Scialanga | Andrei Semenov | Kumar Shridhar | Raghav Singhal | Anna Sotnikova | Alexander Sternfeld | Ayush Kumar Tarun | Paul Teiletche | Jannis Vamvas | Xiaozhe Yao | Hao Zhao | Alexander Ilic | Ana Klimovic | Andreas Krause | Caglar Gulcehre | David Rosenthal | Elliott Ash | Florian Tramèr | Joost VandeVondele | Livio Veraldi | Martin Rajman | Thomas C. Schulthess | Torsten Hoefler | Antoine Bosselut | Martin Jaggi | Imanol Schlag
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Open LLMs enable AI practitioners to control development costs by building on an existing foundation for downstream applications. While offering substantial promise, current models often fail to meet the needs of users needing open solutions aligned with responsible AI principles, including data compliance, transparency, and inclusivity. In this work, we present Apertus, a fully open suite of large language models (LLMs) designed to address responsibility shortcomings in today’s open model ecosystem, namely data responsibility and global representation. Unlike many prior models that release weights without reproducible data pipelines or regard for content-owner rights, Apertus models are pretrained exclusively on openly available data, retroactively respecting robots.txt exclusions and filtering for non-permissive, toxic, and personally identifiable content. To mitigate risks of data memorization, we also adopt the Goldfish objective during pretraining, strongly suppressing verbatim recall of data while retaining downstream task performance. Apertus also drastically expands multilingual coverage, training on 15T tokens from over approximately 1800 languages, with about 40% of pretraining data allocated to non-English content. Released at 8B and 70B scales, Apertus approaches state-of-the-art results among fully open models on multilingual benchmarks, rivaling or surpassing open-weight counterparts.
2025
Psycholinguistic Word Features: a New Approach for the Evaluation of LLMs Alignment with Humans
Javier Conde | Miguel González Saiz | María Grandury | Pedro Reviriego | Gonzalo Martínez | Marc Brysbaert
Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)
Javier Conde | Miguel González Saiz | María Grandury | Pedro Reviriego | Gonzalo Martínez | Marc Brysbaert
Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)
The evaluation of LLMs has so far focused primarily on how well they can perform different tasks such as reasoning, question-answering, paraphrasing, or translating. For most of these tasks, performance can be measured with objective metrics, such as the number of correct answers. However, other language features are not easily quantified. For example, arousal, concreteness, or gender associated with a given word, as well as the extent to which we experience words with senses and relate them to a specific sense. Those features have been studied for many years by psycholinguistics, conducting large-scale experiments with humans to produce ratings for thousands of words. This opens an opportunity to evaluate how well LLMs align with human ratings on these word features, taking advantage of existing studies that cover many different language features in a large number of words. In this paper, we evaluate the alignment of a representative group of LLMs with human ratings on two psycholinguistic datasets: the Glasgow and Lancaster norms. These datasets cover thirteen features over thousands of words. The results show that alignment is significantly better on the Glasgow norms evaluated (arousal, valence, dominance, concreteness, imageability, familiarity, and gender) than on the Lancaster norms evaluated (introceptive, gustatory, olfactory, haptic, auditory, and visual). This suggests a limitation of current LLMs in aligning with human sensory associations for words, which may be due to their lack of embodied cognition present in humans and illustrates the usefulness of evaluating LLMs with psycholinguistic datasets.
La Leaderboard: A Large Language Model Leaderboard for Spanish Varieties and Languages of Spain and Latin America
María Grandury | Javier Aula-Blasco | Júlia Falcão | Clémentine Fourrier | Miguel González Saiz | Gonzalo Martínez | Gonzalo Santamaria Gomez | Rodrigo Agerri | Nuria Aldama García | Luis Chiruzzo | Javier Conde | Helena Gomez Adorno | Marta Guerrero Nieto | Guido Ivetta | Natàlia López Fuertes | Flor Miriam Plaza-del-Arco | María-Teresa Martín-Valdivia | Helena Montoro Zamorano | Carmen Muñoz Sanz | Pedro Reviriego | Leire Rosado Plaza | Alejandro Vaca Serrano | Estrella Vallecillo-Rodríguez | Jorge Vallego | Irune Zubiaga
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
María Grandury | Javier Aula-Blasco | Júlia Falcão | Clémentine Fourrier | Miguel González Saiz | Gonzalo Martínez | Gonzalo Santamaria Gomez | Rodrigo Agerri | Nuria Aldama García | Luis Chiruzzo | Javier Conde | Helena Gomez Adorno | Marta Guerrero Nieto | Guido Ivetta | Natàlia López Fuertes | Flor Miriam Plaza-del-Arco | María-Teresa Martín-Valdivia | Helena Montoro Zamorano | Carmen Muñoz Sanz | Pedro Reviriego | Leire Rosado Plaza | Alejandro Vaca Serrano | Estrella Vallecillo-Rodríguez | Jorge Vallego | Irune Zubiaga
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Leaderboards showcase the current capabilities and limitations of Large Language Models (LLMs). To motivate the development of LLMs that represent the linguistic and cultural diversity of the Spanish-speaking community, we present La Leaderboard, the first open-source leaderboard to evaluate generative LLMs in languages and language varieties of Spain and Latin America. La Leaderboard is a community-driven project that aims to establish an evaluation standard for everyone interested in developing LLMs for the Spanish-speaking community. This initial version combines 66 datasets in Catalan, Basque, Galician, and different Spanish varieties, showcasing the evaluation results of 50 models. To encourage community-driven development of leaderboards in other languages, we explain our methodology, including guidance on selecting the most suitable evaluation setup for each downstream task. In particular, we provide a rationale for using fewer few-shot examples than typically found in the literature, aiming to reduce environmental impact and facilitate access to reproducible results for a broader research community.
Search
Fix author
Co-authors
- Javier Conde 2
- Negar Foroutan 2
- Gonzalo Martínez 2
- Pedro Reviriego 2
- Miguel González Saiz 2
- Michael Aerni 1
- Rodrigo Agerri 1
- Badr AlKhamissi 1
- Mohammad Hossein Amani 1
- Matin Ansaripour 1
- Elliott Ash 1
- Javier Aula-Blasco 1
- Ilia Badanin 1
- Harold Benoit 1
- Arianna Bisazza 1
- Emanuela Boroş 1
- Antoine Bosselut 1
- Nicholas John Browning 1
- Marc Brysbaert 1
- Bastian Bunzeck 1
- Fabian Bösch 1
- Maximilian Böther 1
- Niklas Canova 1
- Camille Challier 1
- Clément Charmillot 1
- Tiancheng Chen 1
- Luis Chiruzzo 1
- Leshem Choshen 1
- Jonathan Coles 1
- Jan Milan Deriu 1
- Arnout Devos 1
- Lukas Drescher 1
- Daniil Dzenhaliou 1
- Maud Ehrmann 1
- Julen Etxaniz 1
- Júlia Falcão 1
- Dongyang Fan 1
- Simin Fan 1
- Clémentine Fourrier 1
- Abdellah Fourtassi 1
- Natàlia López Fuertes 1
- Diana Galván-Sosa 1
- Silin Gao 1
- Dhia Garbaya 1
- Nuria Aldama García 1
- Miguel Gila 1
- Juan Garcia Giraldo 1
- Gonzalo Santamaria Gomez 1
- Helena Gomez Adorno 1
- Çağlar Gu̇lçehre 1
- Akari Haga 1
- Ido Hakimi 1
- Diba Hashemi 1
- Faiz Ghifari Haznitrama 1
- Linyang He 1
- Alejandro Hernández-Cano 1
- Torsten Hoefler 1
- Alexander Miserlis Hoyle 1
- Hai Hu 1
- Allen Hao Huang 1
- Alexander Hägele 1
- Alexander Ilic 1
- Mete Ismayilzada 1
- Guido Ivetta 1
- Martin Jaggi 1
- Jiaming Jiang 1
- Jaap Jumelet 1
- Mark Klein 1
- Ana Klimovic 1
- Andreas Krause 1
- Andrei Kucharavy 1
- Anastasiia Kucherenko 1
- Frederike Lübeck 1
- Roman Machacek 1
- Theofilos Ioannis Manitaras 1
- Mila Marcheva 1
- Andreas Marfurt 1
- Inés Altemir Marinas 1
- María-Teresa Martín-Valdivia 1
- Kyle Matoba 1
- Simon Matrenok 1
- Henrique Mendonça 1
- Bettina Messmer 1
- Francois Meyer 1
- Skander Moalla 1
- Fawzi Roberto Mohamed 1
- Syrielle Montariol 1
- Luca Mouchel 1
- Sven Najem-Meyer 1
- Jingwei Ni 1
- Marta Guerrero Nieto 1
- Gennaro Oliva 1
- Francesca Padovani 1
- Matteo Pagliardini 1
- Elia Palme 1
- Andrei Panferov 1
- Yurii Paniv 1
- Léo Paoletti 1
- Marco Passerini 1
- Ivan Pavlov 1
- Leire Rosado Plaza 1
- Flor Miriam Plaza-del-Arco 1
- Auguste Poiroux 1
- Kaustubh Ponkshe 1
- Laurent Prévot 1
- Barna Pásztor 1
- Martin Rajman 1
- Nathan Ranchin 1
- Javier Rando 1
- Angelika Romanou 1
- David Rosenthal 1
- Vinko Sabolčec 1
- Pouya Sadeghi 1
- Suchir Salhan 1
- Carmen Muñoz Sanz 1
- Mathieu Sauser 1
- Jakhongir Saydaliev 1
- Mukhammadali Sayfiddinov 1
- Imanol Schlag 1
- Marian Schneider 1
- Thomas C. Schulthess 1
- Stefano Schuppli 1
- Marco Scialanga 1
- Andrei Semenov 1
- Bhargav Shandilya 1
- Kumar Shridhar 1
- Raghav Singhal 1
- Antoni-Joan Solergibert 1
- Siyuan Song 1
- Anna Sotnikova 1
- Alexander Sternfeld 1
- Ayush Kumar Tarun 1
- Paul Teiletche 1
- Nikitas Theodoropoulos 1
- Florian Tramèr 1
- Alejandro Vaca Serrano 1
- Estrella Vallecillo-Rodríguez 1
- Jorge Vallego 1
- Jannis Vamvas 1
- Joost VandeVondele 1
- Livio Veraldi 1
- Alex Warstadt 1
- Yixuan Xu 1
- Xiaozhe Yao 1
- Helena Montoro Zamorano 1
- Ziyin Zhang 1
- Hao Zhao 1
- Susana Zhou 1
- Irune Zubiaga 1
- Eduard Frank Ďurech 1