GUIR at SemEval-2026 Task 7: Probing Cultural Knowledge in LLMs via Multi-Agent Debate

Reihaneh Iranmanesh, Ophir Frieder, Nazli Goharian


Abstract
We present the GUIR system for SemEval-2026 Task 7, Everyday Knowledge Across Diverse Languages and Cultures, which probes the extent to which general-purpose LLMs encode cultural knowledge without any culture-specific supervision or fine-tuning. Our system addresses two tracks built on the BLEnD benchmark. For the short-answer question (SAQ) track, we employ zero-shot prompting with gpt-4.1, achieving 55.5% accuracy across 61 language locales. For the multiple-choice question (MCQ) track, we propose a three-stage pipeline: (1) zero-shot chain-of-thought inference with gpt-5-mini, (2) cross-locale majority voting to correct inconsistent predictions, and (3) a multi-agent debate protocol in which three LLM instances argue and adjudicate over residual errors. This pipeline achieves 97.47% overall accuracy across 30 locales, ranking first among all submitted systems on the MCQ track. We further conduct a targeted human evaluation on the Persian locale, revealing that BLEnD’s lemma-matching scorer systematically underestimates model performance, with human annotators scoring the system 18 percentage points higher than the lemma-matching evaluation. This reveals the need for better evaluation of morphologically rich languages like Persian.
Anthology ID:
2026.semeval-1.438
Volume:
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:
SemEval | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3549–3561
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.438/
DOI:
Bibkey:
Cite (ACL):
Reihaneh Iranmanesh, Ophir Frieder, and Nazli Goharian. 2026. GUIR at SemEval-2026 Task 7: Probing Cultural Knowledge in LLMs via Multi-Agent Debate. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 3549–3561, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
GUIR at SemEval-2026 Task 7: Probing Cultural Knowledge in LLMs via Multi-Agent Debate (Iranmanesh et al., SemEval 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.438.pdf
Supplementarymaterial:
 2026.semeval-1.438.SupplementaryMaterial.zip