GQLBench: A Large-Scale Cross-Domain, Cross-Dialect Benchmark for NL2GQL
Yanning Su, Yuhang Zhou, Yang Fang, Sen Liu, Guangnan Ye, Hongfeng Chai
Abstract
Despite growing interest in NL2GQL, benchmarking progress has been constrained by the lack of resources that are simultaneously large-scale, cross-domain, and cross-dialect. To address this gap, we present **GQLBench**, a new benchmark built through an automated and scalable framework that integrates NL2SQL-to-NL2GQL conversion with graph-native data generation. GQLBench supports execution-based evaluation on both Cypher and ISO-GQL, covering hundreds of graph databases and over 20k natural language questions for each dialect. By combining converted data from mature NL2SQL resources with synthetic graph-specific queries, it captures both schema diversity from real-world relational sources and graph-native reasoning challenges, including long paths and cycles. Beyond overall performance comparison, GQLBench also enables fine-grained evaluation across dialects, graph patterns, and query complexity. Experiments on advanced LLMs show that even strong proprietary models struggle on GQLBench, with gemini-3-flash achieving only 35.40% average execution accuracy across the two dialects. Our data and code are available at https://github.com/qxssadf/GQLBench.- Anthology ID:
- 2026.acl-long.1476
- Volume:
- Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 31989–32014
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.1476/
- DOI:
- Cite (ACL):
- Yanning Su, Yuhang Zhou, Yang Fang, Sen Liu, Guangnan Ye, and Hongfeng Chai. 2026. GQLBench: A Large-Scale Cross-Domain, Cross-Dialect Benchmark for NL2GQL. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 31989–32014, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- GQLBench: A Large-Scale Cross-Domain, Cross-Dialect Benchmark for NL2GQL (Su et al., ACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.1476.pdf