Tetiana Bas
2025
Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains
Yurii Paniv
|
Artur Kiulian
|
Dmytro Chaplynskyi
|
Mykola Khandoga
|
Anton Polishko
|
Tetiana Bas
|
Guillermo Gabrielli
Proceedings of the Fourth Ukrainian Natural Language Processing Workshop (UNLP 2025)
While the evaluation of multimodal English-centric models is an active area of research with numerous benchmarks, there is a profound lack of benchmarks or evaluation suites for low- and mid-resource languages. We introduce ZNO-Vision, a comprehensive multimodal Ukrainian-centric benchmark derived from the standardized university entrance examination (ZNO). The benchmark consists of over 4300 expert-crafted questions spanning 12 academic disciplines, including mathematics, physics, chemistry, and humanities. We evaluated the performance of both open-source models and API providers, finding that only a handful of models performed above baseline. Alongside the new benchmark, we performed the first evaluation study of multimodal text generation for the Ukrainian language: we measured caption generation quality on the Multi30K-UK dataset. Lastly, we tested a few models from a cultural perspective on knowledge of national cuisine. We believe our work will advance multimodal generation capabilities for the Ukrainian language and our approach could be useful for other low-resource languages.
Search
Fix author
Co-authors
- Dmytro Chaplynskyi 1
- Guillermo Gabrielli 1
- Mykola Khandoga 1
- Artur Kiulian 1
- Yurii Paniv 1
- show all...