GeoBenchmark: Probing Large Language Models for Geo-Spatial Knowledge

Ayomide Abayomi, Jose G. Moreno, Karim Radouane, Lynda Tamine


Abstract
Large Language Models (LLMs) demonstrate strong factual recall of general-purpose knowledge but struggle with grounded geospatial knowledge. To measure and help probe LLMs for spatial knowledge, we present GeoBenchmark, a benchmark for evaluating geographic commonsense along three core spatial relations: direction, distance, and topology. Using data extracted from YAGO2geo and Ordnance Survey ward geometries, spatial relations were formalized as structured triplets and systematically transformed into balanced binary (Yes/No) and Multiple-Choice (MCQ) question-answer pairs. Besides, we consider atomic and composite questions based on the number of spatial relations involved. The resulting dataset comprises 26k binary and 13k MCQ samples, uniformly distributed across atomic, binary, and ternary relation levels. We establish baselines with LLaMA-8B and Mistral-7B under zero-shot prompting, achieving 52-63% accuracy on atomic questions but below 35% on ternary relations, which exposes the models’ limited compositional spatial understanding and strong option bias. GeoBenchmark provides a comprehensive, reproducible resource for probing and advancing LLMs’ geographic commonsense, paving the way for future research in spatial and geographic probing of LLMs as well as knowledge editing.
Anthology ID:
2026.lrec-main.417
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
5335–5348
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.417/
DOI:
Bibkey:
Cite (ACL):
Ayomide Abayomi, Jose G. Moreno, Karim Radouane, and Lynda Tamine. 2026. GeoBenchmark: Probing Large Language Models for Geo-Spatial Knowledge. International Conference on Language Resources and Evaluation, main:5335–5348.
Cite (Informal):
GeoBenchmark: Probing Large Language Models for Geo-Spatial Knowledge (Abayomi et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.417.pdf