DuReadervis: A Chinese Dataset for Open-domain Document Visual Question Answering

Le Qi; Shangwen Lv; Hongyu Li; Jing Liu; Yu Zhang; Qiaoqiao She; Hua Wu; Haifeng Wang; Ting Liu

doi:10.18653/v1/2022.findings-acl.105

DuReader_vis: A Chinese Dataset for Open-domain Document Visual Question Answering

Le Qi, Shangwen Lv, Hongyu Li, Jing Liu, Yu Zhang, Qiaoqiao She, Hua Wu, Haifeng Wang, Ting Liu

Abstract

Open-domain question answering has been used in a wide range of applications, such as web search and enterprise search, which usually takes clean texts extracted from various formats of documents (e.g., web pages, PDFs, or Word documents) as the information source. However, designing different text extraction approaches is time-consuming and not scalable. In order to reduce human cost and improve the scalability of QA systems, we propose and study an Open-domain Document Visual Question Answering (Open-domain DocVQA) task, which requires answering questions based on a collection of document images directly instead of only document texts, utilizing layouts and visual features additionally. Towards this end, we introduce the first Chinese Open-domain DocVQA dataset called DuReader_vis, containing about 15K question-answering pairs and 158K document images from the Baidu search engine. There are three main challenges in DuReader_vis: (1) long document understanding, (2) noisy texts, and (3) multi-span answer extraction. The extensive experiments demonstrate that the dataset is challenging. Additionally, we propose a simple approach that incorporates the layout and visual features, and the experimental results show the effectiveness of the proposed approach. The dataset and code will be publicly available at https://github.com/baidu/DuReader/tree/master/DuReader-vis.

Anthology ID:: 2022.findings-acl.105
Volume:: Findings of the Association for Computational Linguistics: ACL 2022
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1338–1351
Language:
URL:: https://aclanthology.org/2022.findings-acl.105
DOI:: 10.18653/v1/2022.findings-acl.105
Bibkey:
Cite (ACL):: Le Qi, Shangwen Lv, Hongyu Li, Jing Liu, Yu Zhang, Qiaoqiao She, Hua Wu, Haifeng Wang, and Ting Liu. 2022. DuReadervis: A Chinese Dataset for Open-domain Document Visual Question Answering. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1338–1351, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: DuReadervis: A Chinese Dataset for Open-domain Document Visual Question Answering (Qi et al., Findings 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2022.findings-acl.105.pdf
Code: baidu/DuReader
Data: DocVQA, InfographicVQA, Natural Questions, VisualMRC

PDF Search Code