Challenging Multimodal LLMs with African Standardized Exams: A Document VQA Evaluation
Victor Tolulope Olufemi, Oreoluwa Boluwatife Babatunde, Emmanuel Bolarinwa, Kausar Yetunde Moshood
Abstract
Despite rapid advancements in multimodal large language models (MLLMs), their ability to process low-resource African languages in document-based visual question answering (VQA) tasks remains limited. This paper evaluates three state-of-the-art MLLMs—GPT-4o, Claude-3.5 Haiku, and Gemini-1.5 Pro—on WAEC/NECO standardized exam questions in Yoruba, Igbo, and Hausa. We curate a dataset of multiple-choice questions from exam images and compare model accuracies across two prompting strategies: (1) using English prompts for African language questions, and (2) using native-language prompts. While GPT-4o achieves over 90% accuracy for English, performance drops below 40% for African languages, highlighting severe data imbalance in model training. Notably, native-language prompting improves accuracy for most models, yet no system approaches human-level performance, which reaches over 50% in Yoruba, Igbo, and Hausa. These findings emphasize the need for diverse training data, fine-tuning, and dedicated benchmarks that address the linguistic intricacies of African languages in multimodal tasks, paving the way for more equitable and effective AI systems in education.- Anthology ID:
- 2025.africanlp-1.22
- Volume:
- Proceedings of the Sixth Workshop on African Natural Language Processing (AfricaNLP 2025)
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Constantine Lignos, Idris Abdulmumin, David Adelani
- Venues:
- AfricaNLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 150–157
- Language:
- URL:
- https://preview.aclanthology.org/landing_page/2025.africanlp-1.22/
- DOI:
- 10.18653/v1/2025.africanlp-1.22
- Cite (ACL):
- Victor Tolulope Olufemi, Oreoluwa Boluwatife Babatunde, Emmanuel Bolarinwa, and Kausar Yetunde Moshood. 2025. Challenging Multimodal LLMs with African Standardized Exams: A Document VQA Evaluation. In Proceedings of the Sixth Workshop on African Natural Language Processing (AfricaNLP 2025), pages 150–157, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- Challenging Multimodal LLMs with African Standardized Exams: A Document VQA Evaluation (Olufemi et al., AfricaNLP 2025)
- PDF:
- https://preview.aclanthology.org/landing_page/2025.africanlp-1.22.pdf