Shuyi Wang

Also published as: 舒怡


2025

pdf bib
Calibrating LLMs for Text-to-SQL Parsing by Leveraging Sub-clause Frequencies
Terrance Liu | Shuyi Wang | Daniel Preotiuc-Pietro | Yash Chandarana | Chirag Gupta
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

While large language models (LLMs) achieve strong performance on text-to-SQL parsing, they sometimes exhibit unexpected failures in which they are confidently incorrect. Building trustworthy text-to-SQL systems thus requires eliciting reliable uncertainty measures from the LLM. In this paper, we study the problem of providing a calibrated confidence score that conveys the likelihood of an output query being correct. Our work is the first to establish a benchmark for post-hoc calibration of LLM-based text-to-SQL parsing. In particular, we show that Platt scaling, a canonical method for calibration, provides substantial improvements over directly using raw model output probabilities as confidence scores. Furthermore, we propose a method for text-to-SQL calibration that leverages the structured nature of SQL queries to provide more granular signals of correctness, named “sub-clause frequency” (SCF) scores. Using multivariate Platt scaling (MPS), our extension of the canonical Platt scaling technique, we combine individual SCF scores into an overall accurate and calibrated score. Empirical evaluation on two popular text-to-SQL datasets shows that our approach of combining MPS and SCF yields further improvements in calibration and the related task of error detection over traditional Platt scaling.

2020

pdf bib
基于BERT的端到端中文篇章事件抽取(A BERT-based End-to-End Model for Chinese Document-level Event Extraction)
Hongkuan Zhang (张洪宽) | Hui Song (宋晖) | Shuyi Wang (王舒怡) | Bo Xu (徐波)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

篇章级事件抽取研究从整篇文档中检测事件,识别出事件包含的元素并赋予每个元素特定的角色。本文针对限定领域的中文文档提出了基于BERT的端到端模型,在模型的元素和角色识别中依次引入前序层输出的事件类型以及实体嵌入表示,增强文本的事件、元素和角色关联表示,提高篇章中各事件所属元素的识别精度。在此基础上利用标题信息和事件五元组的嵌入式表示,实现主从事件的划分及元素融合。实验证明本文的方法与现有工作相比具有明显的提升。