Shuyi Wang
Also published as: 舒怡 王
2026
PExA: Parallel Exploration Agent for Complex Text-to-SQL
Tanmay Parekh | Ella Hofmann-Coyle | Shuyi Wang | Sachith Sri Ram Kothur | Srivas Prasad | Yunmo Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Tanmay Parekh | Ella Hofmann-Coyle | Shuyi Wang | Sachith Sri Ram Kothur | Srivas Prasad | Yunmo Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
LLM-based agents for text-to-SQL often struggle with latency-performance trade-off, where performance improvements come at the cost of latency or vice versa. We reformulate text-to-SQL generation within the lens of software test coverage where the original query is prepared with a suite of test cases with simpler, atomic SQLs that are executed in parallel and together ensure semantic coverage of the original query. After iterating on test case coverage, the final SQL is generated only when enough information is gathered, leveraging the explored test case SQLs to ground the final generation. We validated our framework on a state-of-the-art benchmark for text-to-SQL, Spider 2.0, achieving a new state-of-the-art with 70.2% execution accuracy.
2025
Calibrating LLMs for Text-to-SQL Parsing by Leveraging Sub-clause Frequencies
Terrance Liu | Shuyi Wang | Daniel Preotiuc-Pietro | Yash Chandarana | Chirag Gupta
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Terrance Liu | Shuyi Wang | Daniel Preotiuc-Pietro | Yash Chandarana | Chirag Gupta
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
While large language models (LLMs) achieve strong performance on text-to-SQL parsing, they sometimes exhibit unexpected failures in which they are confidently incorrect. Building trustworthy text-to-SQL systems thus requires eliciting reliable uncertainty measures from the LLM. In this paper, we study the problem of providing a calibrated confidence score that conveys the likelihood of an output query being correct. Our work is the first to establish a benchmark for post-hoc calibration of LLM-based text-to-SQL parsing. In particular, we show that Platt scaling, a canonical method for calibration, provides substantial improvements over directly using raw model output probabilities as confidence scores. Furthermore, we propose a method for text-to-SQL calibration that leverages the structured nature of SQL queries to provide more granular signals of correctness, named “sub-clause frequency” (SCF) scores. Using multivariate Platt scaling (MPS), our extension of the canonical Platt scaling technique, we combine individual SCF scores into an overall accurate and calibrated score. Empirical evaluation on two popular text-to-SQL datasets shows that our approach of combining MPS and SCF yields further improvements in calibration and the related task of error detection over traditional Platt scaling.
2020
基于BERT的端到端中文篇章事件抽取(A BERT-based End-to-End Model for Chinese Document-level Event Extraction)
Hongkuan Zhang (张洪宽) | Hui Song (宋晖) | Shuyi Wang (王舒怡) | Bo Xu (徐波)
Proceedings of the 19th Chinese National Conference on Computational Linguistics
Hongkuan Zhang (张洪宽) | Hui Song (宋晖) | Shuyi Wang (王舒怡) | Bo Xu (徐波)
Proceedings of the 19th Chinese National Conference on Computational Linguistics
篇章级事件抽取研究从整篇文档中检测事件,识别出事件包含的元素并赋予每个元素特定的角色。本文针对限定领域的中文文档提出了基于BERT的端到端模型,在模型的元素和角色识别中依次引入前序层输出的事件类型以及实体嵌入表示,增强文本的事件、元素和角色关联表示,提高篇章中各事件所属元素的识别精度。在此基础上利用标题信息和事件五元组的嵌入式表示,实现主从事件的划分及元素融合。实验证明本文的方法与现有工作相比具有明显的提升。