A Dataset for Multimodal Question Answering in the Cultural Heritage Domain

Shurong Sheng, Luc Van Gool, Marie-Francine Moens


Abstract
Multimodal question answering in the cultural heritage domain allows visitors to ask questions in a more natural way and thus provides better user experiences with cultural objects while visiting a museum, landmark or any other historical site. In this paper, we introduce the construction of a golden standard dataset that will aid research of multimodal question answering in the cultural heritage domain. The dataset, which will be soon released to the public, contains multimodal content including images of typical artworks from the fascinating old-Egyptian Amarna period, related image-containing documents of the artworks and over 800 multimodal queries integrating visual and textual questions. The multimodal questions and related documents are all in English. The multimodal questions are linked to relevant paragraphs in the related documents that contain the answer to the multimodal query.
Anthology ID:
W16-4003
Volume:
Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)
Month:
December
Year:
2016
Address:
Osaka, Japan
Venue:
LT4DH
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
10–17
Language:
URL:
https://aclanthology.org/W16-4003
DOI:
Bibkey:
Cite (ACL):
Shurong Sheng, Luc Van Gool, and Marie-Francine Moens. 2016. A Dataset for Multimodal Question Answering in the Cultural Heritage Domain. In Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH), pages 10–17, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
A Dataset for Multimodal Question Answering in the Cultural Heritage Domain (Sheng et al., LT4DH 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/W16-4003.pdf
Data
Visual Question Answering