AraVQA: Building a New Arabic Factoid Visual Question Answering Dataset from Wikipedia
Sultan Alrowili, Younes Samih, Abed Alhakim Freihat, Mathan Kumar Eswaran
Abstract
The development of large-scale Visual Question Answering (VQA) datasets has traditionally relied on resource-intensive manual annotation. In addition, most of the existing Arabic VQA datasets focus on culturally-specific and dialect-aware domains. To address these limitations, we propose a new pipeline that leverages Wikipedia template tags to extract the relevant information for each image, which is subsequently utilized by the Large Language Model (LLM) to synthetically generate a new visual question answering dataset. Using this pipeline, we have constructed AraVQA, the most comprehensive Arabic Factoid Visual Question Answering dataset, containing more than 50,000 questions and covering over 20 varied primary subjects within Arabic general knowledge. Our detailed analysis shows that our dataset can serve as a post-training dataset to enhance the performance of existing Visual Language Models (VLMs) on Arabic VQA tasks. Furthermore, we present a novel benchmark, derived from our dataset and validated through manual annotation, that poses more challenges to Arabic VLMs than existing Arabic VQA datasets.- Anthology ID:
- 2026.acl-long.91
- Volume:
- Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2026–2042
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.91/
- DOI:
- Cite (ACL):
- Sultan Alrowili, Younes Samih, Abed Alhakim Freihat, and Mathan Kumar Eswaran. 2026. AraVQA: Building a New Arabic Factoid Visual Question Answering Dataset from Wikipedia. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2026–2042, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- AraVQA: Building a New Arabic Factoid Visual Question Answering Dataset from Wikipedia (Alrowili et al., ACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.91.pdf