HistoryBankQA: Multilingual Temporal Question Answering on Historical Events

Biswadip Mandal, Anant Khandelwal, Manish Gupta


Abstract
Temporal reasoning over historical events is vital for temporal NLP tasks such as event extraction, entity linking, question answering (QA), timeline summarization, event clustering, and natural language inference. However, benchmarks for evaluating large language models (LLMs) on temporal reasoning remain limited. Existing datasets are small, lack multilingual coverage, and focus on recent events. To address this, we introduce HistoryBank, a multilingual database of 10M+ historical events sourced from Wikipedia timelines and infoboxes. Our database provides unprecedented coverage in both historical depth and linguistic breadth with 10 languages. We also present a comprehensive benchmark covering 6 temporal QA tasks across all languages, evaluating models like LLaMA-3-8B, Mistral-7B, Gemma-2-9B, Qwen3-8B, and GPT4o. GPT-4o consistently performs best; Gemma-2 leads among smaller models. Our work offers a rich resource for advancing multilingual, temporally-aware language understanding of historical events. To support further research, we publicly release our code and datasets. Code available at https://github.com/mandalbiswadip/history-bank and data available at: https://drive.google.com/drive/folders/1vHudioDdI3EeYPbhYjKa0gimxaXvpxB2.
Anthology ID:
2026.starsem-conference.33
Volume:
Proceedings of the 15th Joint Conference on Lexical and Computational Semantics (*SEM 2026)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Saif M. Mohammad, Nedjma Ousidhoum
Venues:
*SEM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
474–496
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.starsem-conference.33/
DOI:
Bibkey:
Cite (ACL):
Biswadip Mandal, Anant Khandelwal, and Manish Gupta. 2026. HistoryBankQA: Multilingual Temporal Question Answering on Historical Events. In Proceedings of the 15th Joint Conference on Lexical and Computational Semantics (*SEM 2026), pages 474–496, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
HistoryBankQA: Multilingual Temporal Question Answering on Historical Events (Mandal et al., *SEM 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.starsem-conference.33.pdf