Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains

Yurii Paniv; Artur Kiulian; Dmytro Chaplynskyi; Mykola Khandoga; Anton Polishko; Tetiana Bas; Guillermo Gabrielli

Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains

Yurii Paniv, Artur Kiulian, Dmytro Chaplynskyi, Mykola Khandoga, Anton Polishko, Tetiana Bas, Guillermo Gabrielli

Abstract

While the evaluation of multimodal English-centric models is an active area of research with numerous benchmarks, there is a profound lack of benchmarks or evaluation suites for low- and mid-resource languages. We introduce ZNO-Vision, a comprehensive multimodal Ukrainian-centric benchmark derived from the standardized university entrance examination (ZNO). The benchmark consists of over 4300 expert-crafted questions spanning 12 academic disciplines, including mathematics, physics, chemistry, and humanities. We evaluated the performance of both open-source models and API providers, finding that only a handful of models performed above baseline. Alongside the new benchmark, we performed the first evaluation study of multimodal text generation for the Ukrainian language: we measured caption generation quality on the Multi30K-UK dataset. Lastly, we tested a few models from a cultural perspective on knowledge of national cuisine. We believe our work will advance multimodal generation capabilities for the Ukrainian language and our approach could be useful for other low-resource languages.

Anthology ID:: 2025.unlp-1.2
Volume:: Proceedings of the Fourth Ukrainian Natural Language Processing Workshop (UNLP 2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria (online)
Editor:: Mariana Romanyshyn
Venues:: UNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14–26
Language:
URL:: https://preview.aclanthology.org/acl25-workshop-ingestion/2025.unlp-1.2/
DOI:
Bibkey:
Cite (ACL):: Yurii Paniv, Artur Kiulian, Dmytro Chaplynskyi, Mykola Khandoga, Anton Polishko, Tetiana Bas, and Guillermo Gabrielli. 2025. Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains. In Proceedings of the Fourth Ukrainian Natural Language Processing Workshop (UNLP 2025), pages 14–26, Vienna, Austria (online). Association for Computational Linguistics.
Cite (Informal):: Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains (Paniv et al., UNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/acl25-workshop-ingestion/2025.unlp-1.2.pdf

PDF Cite Search Fix data