Mind the Query: A Benchmark Dataset towards Text2Cypher Task

Vashu Chauhan; Shobhit Raj; Shashank Mujumdar; Avirup Saha; Anannay Jain

Mind the Query: A Benchmark Dataset towards Text2Cypher Task

Vashu Chauhan, Shobhit Raj, Shashank Mujumdar, Avirup Saha, Anannay Jain

Abstract

We present a high-quality, multi-domain dataset for the Text2Cypher task which is enabling the translation of natural language (NL) questions into executable Cypher queries over graph databases. The dataset comprises 27,529 NL queries and corresponding Cyphers spanning across 11 real-world graph datasets, each accompanied by its corresponding graph database for grounded query execution. To ensure correctness, the queries are validated through a rigorous pipeline combining automated schema, runtime and value checks, along with manual review for logical correctness. Queries are further categorized by complexity to support fine-grained evaluation. We have released our benchmark dataset and code to replicate our data synthesis pipeline on new graph datasets, supporting extensibility and future research for the task of Text2Cypher.

Anthology ID:: 2025.emnlp-industry.133
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:: November
Year:: 2025
Address:: Suzhou (China)
Editors:: Saloni Potdar, Lina Rojas-Barahona, Sebastien Montella
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1890–1905
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.133/
DOI:
Bibkey:
Cite (ACL):: Vashu Chauhan, Shobhit Raj, Shashank Mujumdar, Avirup Saha, and Anannay Jain. 2025. Mind the Query: A Benchmark Dataset towards Text2Cypher Task. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 1890–1905, Suzhou (China). Association for Computational Linguistics.
Cite (Informal):: Mind the Query: A Benchmark Dataset towards Text2Cypher Task (Chauhan et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.133.pdf

PDF Cite Search Fix data