Rostyslav O. Hryniv

2026

From Sparse to Sense-Grounded: Wikipedia Training for Ukrainian Visual-WSD
Yurii Laba | Rostyslav O. Hryniv
Proceedings of the 30th Conference on Computational Natural Language Learning

Visual Word Sense Disambiguation (Visual-WSD) requires ranking the correct image for an ambiguous word given a short trigger phrase. For low-resource languages, it is bottlenecked by scarce sense-level benchmarks and limited sense-aligned multimodal supervision. We study Ukrainian and (i) extend the Ukrainian Visual-WSD benchmark from 87 to 381 instances and benchmark multilingual CLIP checkpoints and multimodal large models, and (ii) introduce two scalable Wikipedia-derived dataset construction methods. Using compute-efficient adaptation we fine-tune a multilingual CLIP backbone and show that sense-grounded supervision drives the improvements: combining our two Wikipedia-derived datasets improves HIT@1 from 37.00% to 43.05%.

Co-authors

Yurii Laba 1

Venues

CoNLL1
WS1

Fix author