Antoni Brosa Rodríguez
Also published as: Antoni Brosa-Rodriguez, Antoni Brosa-Rodríguez
2026
A Parallel Cross-Lingual Benchmark for Multimodal Idiomaticity Understanding
Dilara Torunoğlu-Selamet | Doğukan Arslan | Rodrigo Wilkens | Wei He | Doruk Eryiğit | Thomas Pickard | Adriana S. Pagano | Aline Villavicencio | Gülşen Eryiğit | Ágnes Abuczki | Aida Cardoso | Alesia Lazarenka | Dina Almassova | Amália Mendes | Anna Kanellopoulou | Antoni Brosa-Rodriguez | Baiba Valkovska | Beata Wojtowicz | Bolette Pedersen | Carlos Manuel Hidalgo-Ternero | Chaya Liebeskind | Danka Jokić | Diego Alves | Eleni Triantafyllidi | Erik Velldal | Fred Philippy | Giedre Valunaite Oleskeviciene | Ieva Rizgeliene | Inguna Skadina | Irina Lobzhanidze | Isabell Stinessen Haugen | Jauza Akbar Krito | Jelena M. Marković | Johanna Monti | Josue Alejandro Sauca | Kaja Dobrovoljc Zor | Kingsley O. Ugwuanyi | Laura Rituma | Lilja Øvrelid | Maha Tufail Agro | Manzura Abjalova | Maria Chatzigrigoriou | María del Mar Sánchez Ramos | Marija Pendevska | Masoumeh Seyyedrezaei | Mehrnoush Shamsfard | Momina Ahsan | Muhammad Ahsan Riaz Khan | Nathalie Carmen Hau Norman | Nilay Erdem Ayyıldız | Nina Hosseini-Kivanani | Noémi Ligeti-Nagy | Numaan Naeem | Olha Kanishcheva | Olha Yatsyshyna | Daniil Orel | Petra Giommarelli | Petya Osenova | Radovan Garabik | Regina E. Semou | Rozane Rebechi | Salsabila Zahirah Pranida | Samia Touileb | Sanni Nimb | Sarfraz Ahmad | Sarvinoz Sharipova | Shahar Golan | Shaoxiong Ji | Sopuruchi Christian Aboh | Srdjan Sucur | Stella Markantonatou | Sussi Olsen | Vahide Tajalli | Veronika Lipp | Voula Giouli | Yelda Yeşildal Eraydın | Zahra Saaberi | Zhuohan Xie
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Dilara Torunoğlu-Selamet | Doğukan Arslan | Rodrigo Wilkens | Wei He | Doruk Eryiğit | Thomas Pickard | Adriana S. Pagano | Aline Villavicencio | Gülşen Eryiğit | Ágnes Abuczki | Aida Cardoso | Alesia Lazarenka | Dina Almassova | Amália Mendes | Anna Kanellopoulou | Antoni Brosa-Rodriguez | Baiba Valkovska | Beata Wojtowicz | Bolette Pedersen | Carlos Manuel Hidalgo-Ternero | Chaya Liebeskind | Danka Jokić | Diego Alves | Eleni Triantafyllidi | Erik Velldal | Fred Philippy | Giedre Valunaite Oleskeviciene | Ieva Rizgeliene | Inguna Skadina | Irina Lobzhanidze | Isabell Stinessen Haugen | Jauza Akbar Krito | Jelena M. Marković | Johanna Monti | Josue Alejandro Sauca | Kaja Dobrovoljc Zor | Kingsley O. Ugwuanyi | Laura Rituma | Lilja Øvrelid | Maha Tufail Agro | Manzura Abjalova | Maria Chatzigrigoriou | María del Mar Sánchez Ramos | Marija Pendevska | Masoumeh Seyyedrezaei | Mehrnoush Shamsfard | Momina Ahsan | Muhammad Ahsan Riaz Khan | Nathalie Carmen Hau Norman | Nilay Erdem Ayyıldız | Nina Hosseini-Kivanani | Noémi Ligeti-Nagy | Numaan Naeem | Olha Kanishcheva | Olha Yatsyshyna | Daniil Orel | Petra Giommarelli | Petya Osenova | Radovan Garabik | Regina E. Semou | Rozane Rebechi | Salsabila Zahirah Pranida | Samia Touileb | Sanni Nimb | Sarfraz Ahmad | Sarvinoz Sharipova | Shahar Golan | Shaoxiong Ji | Sopuruchi Christian Aboh | Srdjan Sucur | Stella Markantonatou | Sussi Olsen | Vahide Tajalli | Veronika Lipp | Voula Giouli | Yelda Yeşildal Eraydın | Zahra Saaberi | Zhuohan Xie
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Potentially idiomatic expressions (PIEs) carry meanings inherently tied to the everyday experience of a given language community. As such, they constitute an interesting challenge for assessing the linguistic (and to some extent cultural) capabilities of NLP systems. In this paper, we present XMPIE, a parallel multilingual and multimodal dataset of potentially idiomatic expressions. The dataset, containing 34 languages and over ten thousand items, allows comparative analyses of idiomatic patterns among language-specific realisations and preferences in order to gather insights about shared cultural aspects. This parallel dataset allows evaluation of language model performance for a given PIE in different languages and whether idiomatic understanding in one language can be transferred to another. Moreover, the dataset supports the study of PIEs across textual and visual modalities, to measure to what extent PIE understanding in one modality transfers or implies in understanding in another modality (text vs. image). The data was created by language experts, with both textual and visual components crafted under multilingual guidelines, and each PIE is accompanied by five images representing a spectrum from idiomatic to literal meanings, including semantically related and random distractors. The result is a high-quality benchmark for evaluating multilingual and multimodal idiomatic language understanding.
2025
Beyond the Data: The Impact of Annotation Inconsistencies in UD Treebanks on Typological Universals and Complexity Assessment
Antoni Brosa Rodríguez | M. Dolores Jiménez López
Proceedings of the 7th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
Antoni Brosa Rodríguez | M. Dolores Jiménez López
Proceedings of the 7th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
This study explores the impact of annotation inconsistencies in Universal Dependencies (UD) treebanks on typological research in computational linguistics. UD provides a standardized framework for cross-linguistic annotation, facilitating large-scale empirical studies on linguistic diversity and universals. However, despite rigorous guidelines, annotation inconsistencies persist across treebanks. The objective of this paper is to assess how these inconsistencies affect typological universals, linguistic descriptions, and complexity metrics. We analyze systematic annotation errors in multiple UD treebanks, focusing on morphological features. Case studies on Spanish and Dutch demonstrate how differing annotation decisions within the same language create contradictory typological profiles. We classify the errors into two main categories: overgeneration errors (features incorrectly annotated, since do not actually exist in a language) and data omission errors (inconsistent or incomplete annotation of features that do exist). Our results show that these inconsistencies significantly distort typological analyses, leading to false generalizations and miscalculations of linguistic complexity. We propose methodological safeguards for typological research using UD data. Our findings highlight the need for methodological improvements to ensure more reliable cross-linguistic generalizations in computational typology.
2024
New Proposal of Greenberg’s Universal 14 from Typometrics
Antoni Brosa-Rodríguez | Sylvain Kahane
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Antoni Brosa-Rodríguez | Sylvain Kahane
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
In his Universal 14, Greenberg stated that the normal and dominant order in all world languages was to place the condition before the conclusion in conditional sentences. We take this claim to review it quantitatively and based on occurrences in real texts in more than 50 languages. We can see that Greenberg’s proposal is correct but that it needs a reformulation to be true at all. We propose a quantitatively based and updated Universal 14, which gives a better account of the representation of the different languages analyzed and which is fulfilled in 100% of the cases (as opposed to Greenberg’s 60% in our sample). In addition, we also analyze adverbial sentences. Once we obtain the occurrence data in their direction (before or after the main verb), we plot a new Universal in a typometrical way: 100% of the languages show a higher proportion of preceding conditional clauses than of adverbial clauses, regardless of their type or the direction preference for adverbial clauses. The relationship between the SOV type and a stricter initial conditional location is also proposed.
Search
Fix author
Co-authors
- Manzura Abjalova 1
- Sopuruchi Christian Aboh 1
- Ágnes Abuczki 1
- Maha Tufail Agro 1
- Sarfraz Ahmad 1
- Momina Ahsan 1
- Dina Almassova 1
- Diego Alves 1
- Doğukan Arslan 1
- Aida Cardoso 1
- Maria Chatzigrigoriou 1
- Kaja Dobrovoljc 1
- Nilay Erdem Ayyıldız 1
- Doruk Eryiğit 1
- Gülşen Eryiğit 1
- Radovan Garabik 1
- Petra Giommarelli 1
- Voula Giouli 1
- Shahar Golan 1
- Isabell Stinessen Haugen 1
- Wei He 1
- Carlos Manuel Hidalgo-Ternero 1
- Nina Hosseini-Kivanani 1
- Shaoxiong Ji 1
- Danka Jokić 1
- Sylvain Kahane 1
- Anna Kanellopoulou 1
- Olha Kanishcheva 1
- Muhammad Ahsan Riaz Khan 1
- Jauza Akbar Krito 1
- Alesia Lazarenka 1
- Chaya Liebeskind 1
- Noémi Ligeti-Nagy 1
- Veronika Lipp 1
- Irina Lobzhanidze 1
- M. Dolores Jiménez López 1
- Stella Markantonatou 1
- Jelena M. Marković 1
- Amália Mendes 1
- Johanna Monti 1
- Numaan Naeem 1
- Sanni Nimb 1
- Nathalie Carmen Hau Norman 1
- Sussi Olsen 1
- Daniil Orel 1
- Petya Osenova 1
- Adriana Silvina Pagano 1
- Bolette Sandford Pedersen 1
- Marija Pendevska 1
- Fred Philippy 1
- Thomas Pickard 1
- Salsabila Zahirah Pranida 1
- María Del Mar Sánchez Ramos 1
- Rozane Rebechi 1
- Laura Rituma 1
- Ieva Rizgeliene 1
- Zahra Saaberi 1
- Josue Alejandro Sauca 1
- Regina E. Semou 1
- Masoumeh Seyyedrezaei 1
- Mehrnoush Shamsfard 1
- Sarvinoz Sharipova 1
- Inguna Skadina 1
- Srdjan Sucur 1
- Vahide Tajalli 1
- Dilara Torunoğlu-Selamet 1
- Samia Touileb 1
- Eleni Triantafyllidi 1
- Kingsley O. Ugwuanyi 1
- Baiba Valkovska 1
- Giedre Valunaite Oleskeviciene 1
- Erik Velldal 1
- Aline Villavicencio 1
- Rodrigo Wilkens 1
- Beata Wójtowicz 1
- Zhuohan Xie 1
- Olha Yatsyshyna 1
- Yelda Yeşildal Eraydın 1
- Lilja Øvrelid 1