Bin Chen

Other people with similar names: Bin Chen

Unverified author pages with similar names: Bin Chen


2025

Can today’s Text-to-SQL benchmarks still stretch modern LLMs? We argue no. Spider1.0 and BIRD, painstakingly hand-built, remain small, costly, and skewed toward middle complex SQL. Meanwhile, LLM-generated corpora are inexpensive but often superficial and fragile suffering from shallow nesting, semantic drift, template fatigue, and insufficient quality check.We address this gap with a Chain-of-Verifications framework that turns a handful of expert-labelled seeds into a large, reliably checked dataset at a fraction of the usual cost. The resulting corpus, AIGT2S, delivers: (1)18k Question–SQL pairs across 113 databases, 41–77% larger than current English sets; (2)55% queries in the Ultra band of our four-level difficulty taxonomy; (3)87.5% inter-annotator agreement; (4)≥80% labour and ≥98% monetary savings versus earlier efforts.Baselines including GPT-4o, Llama3, RESDSQL, and MAC-SQL, achieve at most 56% execution accuracy, indicating substantial room for improvement.