A Benchmark for Evaluating Logical Reasoning in Georgian For Large Language Models

Irakli Koberidze; Archil Elizbarashvili; Magda Tsintsadze

A Benchmark for Evaluating Logical Reasoning in Georgian For Large Language Models

Irakli Koberidze, Archil Elizbarashvili, Magda Tsintsadze

Abstract

Advancements in LLMs have largely overlooked low-resource languages (LRLs), creating a gap in evaluation benchmarks. To address this for Georgian, a Kartvelian language, we introduce GeoLogicQA. This novel, manually-curated benchmark assesses LLMs’ logical and inferential reasoning through 100 questions. Questions cover syllogistic deduction, inferential reading comprehension, common-sense reasoning, and arithmetic, adapted from challenging sources (Kangaroo Mathematics Competition) and validated by native Georgian speakers for linguistic nuances. Initial evaluations of state-of-the-art LLMs (Gemini 2.5 Flash, DeepSeek-V3, Grok-3, GPT-4o) show an average accuracy of 64% to 83%, significantly exceeding the human baseline of 47%. While demonstrating strong reasoning potential, error analysis reveals persistent challenges in multi-step combinatorial and highly constrained inferential tasks. GeoLogicQA is a public resource for tracking progress and diagnosing weaknesses in Georgian LLMs. We plan to expand the benchmark and establish a public leader-board to foster continuous improvement.

Anthology ID:: 2025.lowresnlp-1.13
Volume:: Proceedings of the First Workshop on Advancing NLP for Low-Resource Languages
Month:: September
Year:: 2025
Address:: Varna, Bulgaria
Editors:: Ernesto Luis Estevanell-Valladares, Alicia Picazo-Izquierdo, Tharindu Ranasinghe, Besik Mikaberidze, Simon Ostermann, Daniil Gurgurov, Philipp Mueller, Claudia Borg, Marián Šimko
Venues:: LowResNLP | WS
SIG:
Publisher:: INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:: 121–130
Language:
URL:: https://preview.aclanthology.org/corrections-2026-01/2025.lowresnlp-1.13/
DOI:
Bibkey:
Cite (ACL):: Irakli Koberidze, Archil Elizbarashvili, and Magda Tsintsadze. 2025. A Benchmark for Evaluating Logical Reasoning in Georgian For Large Language Models. In Proceedings of the First Workshop on Advancing NLP for Low-Resource Languages, pages 121–130, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):: A Benchmark for Evaluating Logical Reasoning in Georgian For Large Language Models (Koberidze et al., LowResNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/corrections-2026-01/2025.lowresnlp-1.13.pdf

PDF Cite Search Fix data