GENOME: A New Geopolitical Event Methodology and Dataset using Large Language Models

Alessandro Dell’Orto, Jesse Kommandeur


Abstract
Quantitative research in International Relations relies heavily on structured event data, yet existing automated datasets lack up-to-date coverage of both conflictual and cooperative interactions. We introduce GENOME (Geopolitical Event News Observatory, Mapping, and Extraction), an automatically extracted dataset that implements PLOVER’s 16 event types and extends its Actor–Recipient schema with a Third Party role to capture multilateral relations from newswire data. GENOME’s pipeline comprises event extraction, ontology-based classification, entity normalization, and deduplication, leveraging GPT models with one-shot prompting and enforced structured outputs. We compare GENOME against POLECAT dataset over a five-month overlap period across event volume, temporal dynamics, and geographical coverage. Results show that while the two datasets align closely on conflict event types, GENOME captures a more balanced distribution of cooperative events, particularly verbal interactions nearly absent in POLECAT. GENOME also demonstrates improved temporal precision by attributing events to their inferred date of occurrence rather than publication date, and effective deduplication of highly covered events.
Anthology ID:
2026.eeuca-1.9
Volume:
Proceedings of the 9th Workshop on Event Extraction and Understanding: Challenges and Applications (EEUCA 2026)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Ali Hürriyetoğlu, Surendrabikram Thapa, Hristo Tanev
Venues:
EEUCA | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
83–95
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.eeuca-1.9/
DOI:
Bibkey:
Cite (ACL):
Alessandro Dell’Orto and Jesse Kommandeur. 2026. GENOME: A New Geopolitical Event Methodology and Dataset using Large Language Models. In Proceedings of the 9th Workshop on Event Extraction and Understanding: Challenges and Applications (EEUCA 2026), pages 83–95, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
GENOME: A New Geopolitical Event Methodology and Dataset using Large Language Models (Dell’Orto & Kommandeur, EEUCA 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.eeuca-1.9.pdf