Shuai Wang
Other people with similar names: Shuai Wang, Shuai Wang, Shuai Wang, Shuai Wang, Shuai Wang
Unverified author pages with similar names: Shuai Wang
2026
BoundRL: Efficient Token-level Structured Text Segmentation through Reinforced Boundary Generation
Haoyuan Li | Zhengyuan Shen | Sullam Jeoung | Yueyan Chen | Jiayu Li | Qi Zhu | Shuai Wang | Vassilis N. Ioannidis | Huzefa Rangwala
Findings of the Association for Computational Linguistics: ACL 2026
Haoyuan Li | Zhengyuan Shen | Sullam Jeoung | Yueyan Chen | Jiayu Li | Qi Zhu | Shuai Wang | Vassilis N. Ioannidis | Huzefa Rangwala
Findings of the Association for Computational Linguistics: ACL 2026
Structured texts – from technical reports to AI prompts – increasingly require segmentation into semantically meaningful components. Such texts often contain elements beyond plain language, such as code snippets, which conventional sentence-level segmentation methods cannot handle effectively. To address this, we propose BoundRL, a novel approach that jointly performs efficient token-level text segmentation and label prediction for long structured texts. Instead of generating full texts for each segment, it generates only starting tokens and reconstructs the complete texts by locating these tokens within the original texts, thereby reducing inference costs by 90% and minimizing hallucination. To train the models for the boundary generation, BoundRL performs reinforcement learning with verifiable rewards (RLVR) that jointly optimizes document reconstruction fidelity and semantic alignment. It further mitigates entropy collapse by constructing intermediate candidates by perturbing segment boundaries and labels to create stepping stones toward higher-quality solutions. Experiments show that BoundRL enables small language models (1.7B parameters) to outperform few-shot prompting with much larger models as well as SFT and standard RLVR baselines on complex prompts used for LLM applications.
SQL-Trail: Multi-Turn Reinforcement Learning with Interleaved Feedback for Text-to-SQL
Harper Hua | Zhen Han | Zhengyuan Shen | Meng-Chieh Lee | Sheng Guan | Qi Zhu | Sullam Jeoung | Yueyan Chen | Yunfei Bai | Shuai Wang | Vassilis N. Ioannidis | Huzefa Rangwala
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Harper Hua | Zhen Han | Zhengyuan Shen | Meng-Chieh Lee | Sheng Guan | Qi Zhu | Sullam Jeoung | Yueyan Chen | Yunfei Bai | Shuai Wang | Vassilis N. Ioannidis | Huzefa Rangwala
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
While large language models (LLMs) have substantially improved Text-to-SQL generation, a pronounced gap remains between AI systems and human experts on challenging benchmarks such as BIRD-SQL. We argue this gap stems largely from the prevailing single-pass paradigm, which lacks the iterative reasoning, schema exploration, and error-correction behaviors that humans naturally employ. To address this limitation, we introduce SQL-Trail, a multi-turn reinforcement learning (RL) agentic framework for Text-to-SQL. Rather than producing a query in one shot, SQL-Trail interacts with the database environment and uses execution feedback to iteratively refine its predictions. Our approach centers on two key ideas: (i) an adaptive turn-budget allocation mechanism that scales the agent’s interaction depth to match question difficulty, and (ii) a composite reward panel that jointly incentivizes SQL correctness and efficient exploration. Across benchmarks, SQL-Trail sets a new state of the art and delivers strong data efficiency—up to **18×** higher than prior single-pass RL state-of-the-art methods. Notably, our 7B and 14B models outperform substantially larger proprietary systems by **5%** on average, underscoring the effectiveness of interactive, agentic workflows for robust Text-to-SQL generation.
When LLMs Read Tables Carelessly: Measuring and Reducing Data Referencing Errors
Yuqing Yang | Qi Zhu | Zhen Han | Boran Han | Zhengyuan Shen | Shuai Wang | Vassilis N. Ioannidis | Huzefa Rangwala
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yuqing Yang | Qi Zhu | Zhen Han | Boran Han | Zhengyuan Shen | Shuai Wang | Vassilis N. Ioannidis | Huzefa Rangwala
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
While large language models (LLMs) perform well on table tasks, they still make data referencing errors (DREs), i.e., incorrectly citing or omitting table values, despite understanding the table structure. Beyond final-answer accuracy, DREs directly compromise the correctness and reliability of intermediate reasoning steps. Yet prior studies have only offered limited, small-scale analyses. In this work, we present the first systematic evaluation of tabular data referencing errors across different models and tasks. Our results show that DREs occur across all tested models (1.7B to 20B parameters). Furthermore, we demonstrate that incorporating data referencing as a critic significantly improves answer accuracy up to 12.0%, through critic-based filtering and rejection sampling. Finally, we trained a lightweight 4B-parameter critic model that achieves an average F1 score of 78.2% in detecting both in-distribution and out-of-distribution DREs, and effectively assists inference for larger models.