Gathering Compositionality Ratings of Ambiguous Noun-Adjective Multiword Expressions in Galician

Laura Castro, Marcos Garcia


Abstract
Multiword expressions pose numerous challenges to most NLP tasks, and so do their compositionality and semantic ambiguity. The need for resources that make it possible to explore such phenomena is rather pressing, even more so in the case of low-resource languages. In this paper, we present a dataset of noun-adjective compounds in Galician with compositionality scores at token level. These MWEs are ambiguous due to being potentially idiomatic expressions, as well as due to the ambiguity and productivity of their constituents. The dataset comprises 240 MWEs that amount to 322 senses, which are contextualized in two sets of sentences, manually created, and extracted from corpora, totaling 1,858 examples. For this dataset, we gathered human judgments on compositionality levels for compounds, heads, and modifiers. Furthermore, we obtained frequency, ambiguity, and productivity data for compounds and their constituents, and we explored potential correlations between mean compositionality scores and these three properties in terms of compounds, heads, and modifiers. This valuable resource helps evaluate language models on (non-)compositionality and ambiguity, key challenges in NLP, and is especially relevant for Galician, a low-resource variety lacking annotated datasets for such linguistic phenomena.
Anthology ID:
2025.mwe-1.5
Volume:
Proceedings of the 21st Workshop on Multiword Expressions (MWE 2025)
Month:
May
Year:
2025
Address:
Albuquerque, New Mexico, U.S.A.
Editors:
Atul Kr. Ojha, Voula Giouli, Verginica Barbu Mititelu, Mathieu Constant, Gražina Korvel, A. Seza Doğruöz, Alexandre Rademaker
Venues:
MWE | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
32–40
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.mwe-1.5/
DOI:
Bibkey:
Cite (ACL):
Laura Castro and Marcos Garcia. 2025. Gathering Compositionality Ratings of Ambiguous Noun-Adjective Multiword Expressions in Galician. In Proceedings of the 21st Workshop on Multiword Expressions (MWE 2025), pages 32–40, Albuquerque, New Mexico, U.S.A.. Association for Computational Linguistics.
Cite (Informal):
Gathering Compositionality Ratings of Ambiguous Noun-Adjective Multiword Expressions in Galician (Castro & Garcia, MWE 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.mwe-1.5.pdf