KoViDoRe: A Benchmark for Korean Visual Document Retrieval

Yongbin Choi; Yongwoo Song; Mujeen Sung

KoViDoRe: A Benchmark for Korean Visual Document Retrieval

Abstract

Recent advances in multimodal retrieval have improved the ability to retrieve information from visually rich documents such as PDFs and reports. However, existing benchmarks remain largely centered on English and provide limited coverage of Korean visual documents with complex structures. Furthermore, most existing Korean resources primarily evaluate single-page retrieval, failing to capture realistic scenarios that require evidence aggregation across multiple pages. To address these gaps, we introduce KoViDoRe, a benchmark for Korean visual document retrieval. The dataset is constructed from publicly available Korean documents with diverse layouts, including tables, figures, and multi-column structures. We develop a multi-stage data curation pipeline consisting of structured document parsing, synthetic query generation using both summary-based and context-based strategies, and relevance mapping with human verification. Using KoViDoRe, we evaluate a wide range of multimodal retrieval models and observe that current models struggle to effectively handle Korean visual document retrieval, particularly in settings involving structured content and diverse query types. Motivated by this finding, we further curate a large-scale training dataset, Ko-VDR Train Public, to support the development of retrieval models tailored to Korean visual documents. Together, KoViDoRe and Ko-VDR Train Public provide a unified benchmark and training resource for Korean visual document retrieval.

Anthology ID:: 2026.magmar-main.11
Volume:: Proceedings of the 2nd Workshop on Multimodal Augmented Generation via Multimodal Retrieval (MAGMaR 2026)
Month:: July
Year:: 2026
Address:: San Diego, USA
Editors:: Kenton Murray, Reno Kriz
Venues:: MAGMaR | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 54–80
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.magmar-main.11/
DOI:
Bibkey:
Cite (ACL):: Yongbin Choi, Yongwoo Song, and Mujeen Sung. 2026. KoViDoRe: A Benchmark for Korean Visual Document Retrieval. In Proceedings of the 2nd Workshop on Multimodal Augmented Generation via Multimodal Retrieval (MAGMaR 2026), pages 54–80, San Diego, USA. Association for Computational Linguistics.
Cite (Informal):: KoViDoRe: A Benchmark for Korean Visual Document Retrieval (Choi et al., MAGMaR 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.magmar-main.11.pdf

PDF Cite Search Fix data