Iria de-Dios-Flores
2026
The MultiplEYE Text Corpus: Towards a Diverse and Ever-Expanding Multilingual Text Corpus
Ramunė Kasperė | Anna Bondar | Sergiu Nisioi | Maja Stegenwallner-Schütz | Hanne B. Søndergaard Knudsen | Ana Matić | Eva Pavlinušić Vilus | Dorota Klimek-Jankowska | Chiara Tschirner | Not Battesta Soliva | Deborah N. Jakobi | Cui Ding | Dima Abu Romi | Cengiz Acarturk | Matilda Agdler | Anton Marius Alexandru | Mohd Faizan Ansari | Annalisa Arcidiacono | Elizabete Ausma Velta Barisa | Ana Bautista | Lisa Beinborn | Yevgeni Berzak | Nedeljka Bjelanović | Anna Isabelle Bothmann | Jan Brasser | Caterina Cacioli | Anila Çepani | Ilze Ceple | Adelina Cerpja | Dalí Chirino | Jan Chromý | Alessandro Corona Mendozza | Iria de-Dios-Flores | Nazik Dinçtopal Deniz | Ana Došen | Kristian Elersič | Inmaculada Fajardo | Zigmunds Freibergs | Angelina Ganebnaya | Shan Gao | Jéssica Gomes | Annjo Klungervik Greenall | Alba Haveriku | Miao He | Anamaria Hodivoianu | Yu-Yin Hsu | Amanda Isaksen | Andreia Janeiro | Kristine Jensen de López | Aleksandar Jevremovic | Vojislav Jovanovic | Hanna Kędzierska | Nik Kharlamov | Sara Kosutar | Nelda Kote | Vanja Kovic | Izabela Krejtz | Thyra Krosness | Oleksandra Kuvshynova | Eilam Lavy | Ella Lion | Marta Łockiewicz | Kaidi Lõo | Paula Luegi | Mircea Mihai Marin | Clara Martin | Svitlana Matvieieva | Diane C. Mézière | Xavier Mínguez-López | Valeriia Modina | Jurgita Motiejūnienė | Marie-Luise Müller | Tolgonai Nasipbek kyzy | Jamal Abdul Nasir | Johanne S. K. Nedergård | Ayşegül Özkan | Patrizia Paggio | Marijan Palmović | Maria Christina Panagiotopoulou | Alberto Parola | Helena Pérez | Klaudia Petersen | Anja Podlesek | Eva Pospíšilová | Marta Praulina | Mikuláš Preininger | Loredana Pungă | Diego Rossini | Špela Rot | Habib Sani Yahaya | Irina A. Sekerina | Anne Gabija Skadina | Jordi Solé-Casals | Lonneke van der Plas | Saara M. Varjopuro | Spyridoula Varlokosta | João Veríssimo | Oskari Juhapekka Virtanen | Nemanja Vračar | Mila Vulchanova | Ahmad Mustapha Wali | Peizheng Wu | Nilgün Yücel | Stefan Frank | Nora Hollenstein | Lena Jäger
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Ramunė Kasperė | Anna Bondar | Sergiu Nisioi | Maja Stegenwallner-Schütz | Hanne B. Søndergaard Knudsen | Ana Matić | Eva Pavlinušić Vilus | Dorota Klimek-Jankowska | Chiara Tschirner | Not Battesta Soliva | Deborah N. Jakobi | Cui Ding | Dima Abu Romi | Cengiz Acarturk | Matilda Agdler | Anton Marius Alexandru | Mohd Faizan Ansari | Annalisa Arcidiacono | Elizabete Ausma Velta Barisa | Ana Bautista | Lisa Beinborn | Yevgeni Berzak | Nedeljka Bjelanović | Anna Isabelle Bothmann | Jan Brasser | Caterina Cacioli | Anila Çepani | Ilze Ceple | Adelina Cerpja | Dalí Chirino | Jan Chromý | Alessandro Corona Mendozza | Iria de-Dios-Flores | Nazik Dinçtopal Deniz | Ana Došen | Kristian Elersič | Inmaculada Fajardo | Zigmunds Freibergs | Angelina Ganebnaya | Shan Gao | Jéssica Gomes | Annjo Klungervik Greenall | Alba Haveriku | Miao He | Anamaria Hodivoianu | Yu-Yin Hsu | Amanda Isaksen | Andreia Janeiro | Kristine Jensen de López | Aleksandar Jevremovic | Vojislav Jovanovic | Hanna Kędzierska | Nik Kharlamov | Sara Kosutar | Nelda Kote | Vanja Kovic | Izabela Krejtz | Thyra Krosness | Oleksandra Kuvshynova | Eilam Lavy | Ella Lion | Marta Łockiewicz | Kaidi Lõo | Paula Luegi | Mircea Mihai Marin | Clara Martin | Svitlana Matvieieva | Diane C. Mézière | Xavier Mínguez-López | Valeriia Modina | Jurgita Motiejūnienė | Marie-Luise Müller | Tolgonai Nasipbek kyzy | Jamal Abdul Nasir | Johanne S. K. Nedergård | Ayşegül Özkan | Patrizia Paggio | Marijan Palmović | Maria Christina Panagiotopoulou | Alberto Parola | Helena Pérez | Klaudia Petersen | Anja Podlesek | Eva Pospíšilová | Marta Praulina | Mikuláš Preininger | Loredana Pungă | Diego Rossini | Špela Rot | Habib Sani Yahaya | Irina A. Sekerina | Anne Gabija Skadina | Jordi Solé-Casals | Lonneke van der Plas | Saara M. Varjopuro | Spyridoula Varlokosta | João Veríssimo | Oskari Juhapekka Virtanen | Nemanja Vračar | Mila Vulchanova | Ahmad Mustapha Wali | Peizheng Wu | Nilgün Yücel | Stefan Frank | Nora Hollenstein | Lena Jäger
Proceedings of the Fifteenth Language Resources and Evaluation Conference
We present the MultiplEYE Text Corpus, a large-scale, document-level, multi-parallel resource designed to advance cross-linguistic research on reading and language processing. The corpus provides paragraph-level alignment for texts in 39 languages spanning seven language families and seven scripts. Unlike many existing multilingual corpora, a substantial number of documents were originally written in languages other than English, reducing English-centric bias and supporting more typologically diverse investigations. The texts are carefully selected to balance linguistic richness with experimental feasibility, particularly for eye-tracking-while-reading studies. Developed within a multi-lab initiative, the MultiplEYE Text Corpus follows unified translation, alignment, and experimental design guidelines to ensure cross-linguistic comparability. Its inclusion of texts varying in type and difficulty enables research on discourselevel processing, genre effects, and individual differences across a wide range of languages. The text corpus and accompanying metadata provide a robust foundation for multilingual psycholinguistic and computational modeling research. Data and materials are publicly available at https://doi.org/10.23668/psycharchives.21750.
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 2
Marlo Souza | Iria de-Dios-Flores | Diana Santos | Larissa Freitas | Jackson Wilke da Cruz Souza | Eugénio Ribeiro
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 2
Marlo Souza | Iria de-Dios-Flores | Diana Santos | Larissa Freitas | Jackson Wilke da Cruz Souza | Eugénio Ribeiro
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 2
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Marlo Souza | Iria de-Dios-Flores | Diana Santos | Larissa Freitas | Jackson Wilke da Cruz Souza | Eugénio Ribeiro
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Marlo Souza | Iria de-Dios-Flores | Diana Santos | Larissa Freitas | Jackson Wilke da Cruz Souza | Eugénio Ribeiro
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
2025
Truth Knows No Language: Evaluating Truthfulness Beyond English
Blanca Calvo Figueras | Eneko Sagarzazu | Julen Etxaniz | Jeremy Barnes | Pablo Gamallo | Iria de-Dios-Flores | Rodrigo Agerri
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Blanca Calvo Figueras | Eneko Sagarzazu | Julen Etxaniz | Jeremy Barnes | Pablo Gamallo | Iria de-Dios-Flores | Rodrigo Agerri
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
We introduce a professionally translated extension of the TruthfulQA benchmark designed to evaluate truthfulness in Basque, Catalan, Galician, and Spanish. Truthfulness evaluations of large language models (LLMs) have primarily been focused on English. However, the ability of LLMs to maintain truthfulness across languages remains under-explored. Our study evaluates 12 state-of-the-art open LLMs, comparing base and instruction-tuned models using human evaluation, multiple-choice metrics, and LLM-as-a-Judge scoring. Our findings reveal that, while LLMs perform best in English and worst in Basque (the lowest-resourced language), overall truthfulness discrepancies across languages are smaller than anticipated. Furthermore, we show that LLM-as-a-Judge correlates more closely with human judgments than multiple-choice metrics, and that informativeness plays a critical role in truthfulness assessment. Our results also indicate that machine translation provides a viable approach for extending truthfulness benchmarks to additional languages, offering a scalable alternative to professional translation. Finally, we observe that universal knowledge questions are better handled across languages than context- and time-dependent ones, highlighting the need for truthfulness evaluations that account for cultural and temporal variability. Datasets, models and code are publicly available under open licenses.
IberoBench: A Benchmark for LLM Evaluation in Iberian Languages
Irene Baucells | Javier Aula-Blasco | Iria de-Dios-Flores | Silvia Paniagua Suárez | Naiara Perez | Anna Salles | Susana Sotelo Docio | Júlia Falcão | Jose Javier Saiz | Robiert Sepulveda Torres | Jeremy Barnes | Pablo Gamallo | Aitor Gonzalez-Agirre | German Rigau | Marta Villegas
Proceedings of the 31st International Conference on Computational Linguistics
Irene Baucells | Javier Aula-Blasco | Iria de-Dios-Flores | Silvia Paniagua Suárez | Naiara Perez | Anna Salles | Susana Sotelo Docio | Júlia Falcão | Jose Javier Saiz | Robiert Sepulveda Torres | Jeremy Barnes | Pablo Gamallo | Aitor Gonzalez-Agirre | German Rigau | Marta Villegas
Proceedings of the 31st International Conference on Computational Linguistics
The current best practice to measure the performance of base Large Language Models is to establish a multi-task benchmark that covers a range of capabilities of interest. Currently, however, such benchmarks are only available in a few high-resource languages. To address this situation, we present IberoBench, a multilingual, multi-task benchmark for Iberian languages (i.e., Basque, Catalan, Galician, European Spanish and European Portuguese) built on the LM Evaluation Harness framework. The benchmark consists of 62 tasks divided into 179 subtasks. We evaluate 33 existing LLMs on IberoBench on 0- and 5-shot settings. We also explore the issues we encounter when working with the Harness and our approach to solving them to ensure high-quality evaluation.
2024
CorpusNÓS: A massive Galician corpus for training large language models
Iria de-Dios-Flores | Silvia Paniagua Suárez | Cristina Carbajal Pérez | Daniel Bardanca Outeiriño | Marcos Garcia | Pablo Gamallo
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1
Iria de-Dios-Flores | Silvia Paniagua Suárez | Cristina Carbajal Pérez | Daniel Bardanca Outeiriño | Marcos Garcia | Pablo Gamallo
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1
Exploring the effects of vocabulary size in neural machine translation: Galician as a target language
Daniel Bardanca Outeirinho | Pablo Gamallo Otero | Iria de-Dios-Flores | José Ramom Pichel Campos
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1
Daniel Bardanca Outeirinho | Pablo Gamallo Otero | Iria de-Dios-Flores | José Ramom Pichel Campos
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1
2023
Dependency resolution at the syntax-semantics interface: psycholinguistic and computational insights on control dependencies
Iria de-Dios-Flores | Juan Garcia Amboage | Marcos Garcia
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Iria de-Dios-Flores | Juan Garcia Amboage | Marcos Garcia
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Using psycholinguistic and computational experiments we compare the ability of humans and several pre-trained masked language models to correctly identify control dependencies in Spanish sentences such as ‘José le prometió/ordenó a María ser ordenado/a’ (‘Joseph promised/ordered Mary to be tidy’). These structures underlie complex anaphoric and agreement relations at the interface of syntax and semantics, allowing us to study lexically-guided antecedent retrieval processes. Our results show that while humans correctly identify the (un)acceptability of the strings, language models often fail to identify the correct antecedent in non-adjacent dependencies, showing their reliance on linearity. Additional experiments on Galician reinforce these conclusions. Our findings are equally valuable for the evaluation of language models’ ability to capture linguistic generalizations, as well as for psycholinguistic theories of anaphor resolution.
2022
The Nós Project: Opening routes for the Galician language in the field of language technologies
Iria de-Dios-Flores | Carmen Magariños | Adina Ioana Vladu | John E. Ortega | José Ramom Pichel | Marcos García | Pablo Gamallo | Elisa Fernández Rei | Alberto Bugarín-Diz | Manuel González González | Senén Barro | Xosé Luis Regueira
Proceedings of the Workshop Towards Digital Language Equality within the 13th Language Resources and Evaluation Conference
Iria de-Dios-Flores | Carmen Magariños | Adina Ioana Vladu | John E. Ortega | José Ramom Pichel | Marcos García | Pablo Gamallo | Elisa Fernández Rei | Alberto Bugarín-Diz | Manuel González González | Senén Barro | Xosé Luis Regueira
Proceedings of the Workshop Towards Digital Language Equality within the 13th Language Resources and Evaluation Conference
The development of language technologies (LTs) such as machine translation, text analytics, and dialogue systems is essential in the current digital society, culture and economy. These LTs, widely supported in languages in high demand worldwide, such as English, are also necessary for smaller and less economically powerful languages, as they are a driving force in the democratization of the communities that use them due to their great social and cultural impact. As an example, dialogue systems allow us to communicate with machines in our own language; machine translation increases access to contents in different languages, thus facilitating intercultural relations; and text-to-speech and speech-to-text systems broaden different categories of users’ access to technology. In the case of Galician (co-official language, together with Spanish, in the autonomous region of Galicia, located in northwestern Spain), incorporating the language into state-of-the-art AI applications can not only significantly favor its prestige (a decisive factor in language normalization), but also guarantee citizens’ language rights, reduce social inequality, and narrow the digital divide. This is the main motivation behind the Nós Project (Proxecto Nós), which aims to have a significant contribution to the development of LTs in Galician (currently considered a low-resource language) by providing openly licensed resources, tools, and demonstrators in the area of intelligent technologies.
Search
Fix author
Co-authors
- Pablo Gamallo 4
- Marcos Garcia 3
- Jeremy Barnes 2
- Larissa Freitas 2
- José Ramom Pichel Campos 2
- Eugénio Ribeiro 2
- Diana Santos 2
- Marlo Souza 2
- Jackson Wilke da Cruz Souza 2
- Silvia Paniagua Suárez 2
- Jamal Abdul Nasir 1
- Dima Abu Romi 1
- Cengiz Acarturk 1
- Matilda Agdler 1
- Rodrigo Agerri 1
- Anton Marius Alexandru 1
- Mohd Faizan Ansari 1
- Annalisa Arcidiacono 1
- Javier Aula-Blasco 1
- Hanne B. Søndergaard Knudsen 1
- Elizabete Ausma Velta Barisa 1
- Senén Barro 1
- Not Battesta Soliva 1
- Irene Baucells 1
- Ana Bautista 1
- Lisa Beinborn 1
- Yevgeni Berzak 1
- Nedeljka Bjelanović 1
- Anna Bondar 1
- Anna Isabelle Bothmann 1
- Jan Brasser 1
- Alberto Bugarín-Diz 1
- Caterina Cacioli 1
- Blanca Calvo Figueras 1
- Ilze Ceple 1
- Adelina Cerpja 1
- Dalí Chirino 1
- Jan Chromý 1
- Alessandro Corona Mendozza 1
- Nazik Dinctopal Deniz 1
- Cui Ding 1
- Ana Došen 1
- Kristian Elersič 1
- Julen Etxaniz 1
- Inmaculada Fajardo 1
- Júlia Falcão 1
- Stefan L. Frank 1
- Zigmunds Freibergs 1
- Angelina Ganebnaya 1
- Shan Gao 1
- Juan Garcia Amboage 1
- Jéssica Gomes 1
- Manuel González González 1
- Aitor González-Agirre 1
- Annjo Klungervik Greenall 1
- Alba Haveriku 1
- Miao He 1
- Anamaria Hodivoianu 1
- Nora Hollenstein 1
- Yu-Yin Hsu 1
- Amanda Isaksen 1
- Deborah N. Jakobi 1
- Andreia Janeiro 1
- Kristine Jensen de López 1
- Aleksandar Jevremovic 1
- Vojislav Jovanovic 1
- Lena Ann Jäger 1
- Ramunė Kasperė 1
- Nik Kharlamov 1
- Dorota Klimek-Jankowska 1
- Nelda Kote 1
- Vanja Kovic 1
- Sara Košutar 1
- Izabela Krejtz 1
- Thyra Krosness 1
- Oleksandra Kuvshynova 1
- Hanna Kędzierska 1
- Eilam Lavy 1
- Ella Lion 1
- Paula Luegi 1
- Kaidi Lõo 1
- Carmen Magariños 1
- Mircea Mihai Marin 1
- Clara Martin 1
- Ana Matić 1
- Svitlana Matvieieva 1
- Valeriia Modina 1
- Jurgita Motiejūnienė 1
- Diane C. Mézière 1
- Xavier Mínguez-López 1
- Marie-Luise Müller 1
- Tolgonai Nasipbek kyzy 1
- Johanne S. K. Nedergård 1
- Sergiu Nisioi 1
- John E. Ortega 1
- Pablo Gamallo Otero 1
- Daniel Bardanca Outeirinho 1
- Daniel Bardanca Outeiriño 1
- Patrizia Paggio 1
- Marijan Palmović 1
- Maria Christina Panagiotopoulou 1
- Alberto Parola 1
- Eva Pavlinušić Vilus 1
- Klaudia Petersen 1
- Anja Podlesek 1
- Eva Pospíšilová 1
- Marta Praulina 1
- Mikuláš Preininger 1
- Loredana Pungă 1
- Cristina Carbajal Pérez 1
- Naiara Pérez 1
- Helena Pérez 1
- Xosé Luis Regueira 1
- Elisa Fernández Rei 1
- German Rigau 1
- Diego Rossini 1
- Špela Rot 1
- Eneko Sagarzazu 1
- José Javier Saiz 1
- Anna Sallés 1
- Habib Sani Yahaya 1
- Irina A. Sekerina 1
- Robiert Sepúlveda-Torres 1
- Anne Gabija Skadina 1
- Jordi Solé-Casals 1
- Susana Sotelo 1
- Maja Stegenwallner-Schütz 1
- Chiara Tschirner 1
- Saara M. Varjopuro 1
- Spyridoula Varlokosta 1
- João Veríssimo 1
- Marta Villegas 1
- Oskari Juhapekka Virtanen 1
- Adina Ioana Vladu 1
- Nemanja Vračar 1
- Mila Vulchanova 1
- Ahmad Mustapha Wali 1
- Peizheng Wu 1
- Nilgün Yücel 1
- Lonneke van der Plas 1
- Anila Çepani 1
- Ayşegül Özkan 1
- Marta Łockiewicz 1