Rostyslav O. Hryniv
2026
From Sparse to Sense-Grounded: Wikipedia Training for Ukrainian Visual-WSD
Yurii Laba | Rostyslav O. Hryniv
Proceedings of the 30th Conference on Computational Natural Language Learning
Yurii Laba | Rostyslav O. Hryniv
Proceedings of the 30th Conference on Computational Natural Language Learning
Visual Word Sense Disambiguation (Visual-WSD) requires ranking the correct image for an ambiguous word given a short trigger phrase. For low-resource languages, it is bottlenecked by scarce sense-level benchmarks and limited sense-aligned multimodal supervision. We study Ukrainian and (i) extend the Ukrainian Visual-WSD benchmark from 87 to 381 instances and benchmark multilingual CLIP checkpoints and multimodal large models, and (ii) introduce two scalable Wikipedia-derived dataset construction methods. Using compute-efficient adaptation we fine-tune a multilingual CLIP backbone and show that sense-grounded supervision drives the improvements: combining our two Wikipedia-derived datasets improves HIT@1 from 37.00% to 43.05%.