Shuai Wang
Other people with similar names: Shuai Wang, Shuai Wang, Shuai Wang, Shuai Wang, Shuai Wang
Unverified author pages with similar names: Shuai Wang
2026
When LLMs Read Tables Carelessly: Measuring and Reducing Data Referencing Errors
Yuqing Yang | Qi Zhu | Zhen Han | Boran Han | Zhengyuan Shen | Shuai Wang | Vassilis N. Ioannidis | Huzefa Rangwala
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yuqing Yang | Qi Zhu | Zhen Han | Boran Han | Zhengyuan Shen | Shuai Wang | Vassilis N. Ioannidis | Huzefa Rangwala
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
While large language models (LLMs) perform well on table tasks, they still make data referencing errors (DREs), i.e., incorrectly citing or omitting table values, despite understanding the table structure. Beyond final-answer accuracy, DREs directly compromise the correctness and reliability of intermediate reasoning steps. Yet prior studies have only offered limited, small-scale analyses. In this work, we present the first systematic evaluation of tabular data referencing errors across different models and tasks. Our results show that DREs occur across all tested models (1.7B to 20B parameters). Furthermore, we demonstrate that incorporating data referencing as a critic significantly improves answer accuracy up to 12.0%, through critic-based filtering and rejection sampling. Finally, we trained a lightweight 4B-parameter critic model that achieves an average F1 score of 78.2% in detecting both in-distribution and out-of-distribution DREs, and effectively assists inference for larger models.
SQL-Trail: Multi-Turn Reinforcement Learning with Interleaved Feedback for Text-to-SQL
Harper Hua | Zhen Han | Zhengyuan Shen | Meng-Chieh Lee | Sheng Guan | Qi Zhu | Sullam Jeoung | Yueyan Chen | Yunfei Bai | Shuai Wang | Vassilis N. Ioannidis | Huzefa Rangwala
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Harper Hua | Zhen Han | Zhengyuan Shen | Meng-Chieh Lee | Sheng Guan | Qi Zhu | Sullam Jeoung | Yueyan Chen | Yunfei Bai | Shuai Wang | Vassilis N. Ioannidis | Huzefa Rangwala
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
While large language models (LLMs) have substantially improved Text-to-SQL generation, a pronounced gap remains between AI systems and human experts on challenging benchmarks such as BIRD-SQL. We argue this gap stems largely from the prevailing single-pass paradigm, which lacks the iterative reasoning, schema exploration, and error-correction behaviors that humans naturally employ. To address this limitation, we introduce SQL-Trail, a multi-turn reinforcement learning (RL) agentic framework for Text-to-SQL. Rather than producing a query in one shot, SQL-Trail interacts with the database environment and uses execution feedback to iteratively refine its predictions. Our approach centers on two key ideas: (i) an adaptive turn-budget allocation mechanism that scales the agent’s interaction depth to match question difficulty, and (ii) a composite reward panel that jointly incentivizes SQL correctness and efficient exploration. Across benchmarks, SQL-Trail sets a new state of the art and delivers strong data efficiency—up to **18×** higher than prior single-pass RL state-of-the-art methods. Notably, our 7B and 14B models outperform substantially larger proprietary systems by **5%** on average, underscoring the effectiveness of interactive, agentic workflows for robust Text-to-SQL generation.