WiC Evaluation in Galician and Spanish: Effects of Dataset Quality and Composition

Marta Vázquez Abuín, Marcos Garcia


Abstract
This work explores the impact of dataset quality and composition on Word-in-Context performance for Galician and Spanish. We assess existing datasets, validate their test sets, and create new manually constructed evaluation data. Across five experiments with controlled variations in training and test data, we find that while the validation of test data tends to yield better model performance, evaluations on manually created datasets suggest that contextual embeddings are not sufficient on their own to reliably capture word meaning variation. Regarding training data, our results suggest that performance is influenced not only by size and human validation but also by deeper factors related to the semantic properties of the datasets. All new resources will be freely released.
Anthology ID:
2025.starsem-1.13
Volume:
Proceedings of the 14th Joint Conference on Lexical and Computational Semantics (*SEM 2025)
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Lea Frermann, Mark Stevenson
Venue:
*SEM
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
172–178
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.starsem-1.13/
DOI:
Bibkey:
Cite (ACL):
Marta Vázquez Abuín and Marcos Garcia. 2025. WiC Evaluation in Galician and Spanish: Effects of Dataset Quality and Composition. In Proceedings of the 14th Joint Conference on Lexical and Computational Semantics (*SEM 2025), pages 172–178, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
WiC Evaluation in Galician and Spanish: Effects of Dataset Quality and Composition (Abuín & Garcia, *SEM 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.starsem-1.13.pdf