Daniel Solla
2026
Incorporating Multiword Expressions in Galician Neural Machine Translation: Compositionality, Efficiency, and Performance
Daniel Solla | Paula Pinto-Ferro | Laura Castro | Pablo Gamallo | Marcos Garcia
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)
Daniel Solla | Paula Pinto-Ferro | Laura Castro | Pablo Gamallo | Marcos Garcia
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)
This paper explores the behavior of neural machine translation models on two newly introduced datasets containing noun-adjective MWEs with different degrees of semantic ambiguity and compositionality. We compare general-domain machine translation systems with fine-tuned models exposed to small subsets of the target MWEs. By assessing the effects of the learning steps and corpus size, we found that carefully designed fine-tuned may improve MWE handling while mitigating catastrophic forgetting. However, our error analysis reveals that models still struggle in several scenarios, particularly when translating MWEs with idiomatic meanings. Both the datasets and the experiments focus on translation involving Galician, English, and Spanish.