Terrance Liu
2025
Calibrating LLMs for Text-to-SQL Parsing by Leveraging Sub-clause Frequencies
Terrance Liu
|
Shuyi Wang
|
Daniel Preotiuc-Pietro
|
Yash Chandarana
|
Chirag Gupta
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
While large language models (LLMs) achieve strong performance on text-to-SQL parsing, they sometimes exhibit unexpected failures in which they are confidently incorrect. Building trustworthy text-to-SQL systems thus requires eliciting reliable uncertainty measures from the LLM. In this paper, we study the problem of providing a calibrated confidence score that conveys the likelihood of an output query being correct. Our work is the first to establish a benchmark for post-hoc calibration of LLM-based text-to-SQL parsing. In particular, we show that Platt scaling, a canonical method for calibration, provides substantial improvements over directly using raw model output probabilities as confidence scores. Furthermore, we propose a method for text-to-SQL calibration that leverages the structured nature of SQL queries to provide more granular signals of correctness, named “sub-clause frequency” (SCF) scores. Using multivariate Platt scaling (MPS), our extension of the canonical Platt scaling technique, we combine individual SCF scores into an overall accurate and calibrated score. Empirical evaluation on two popular text-to-SQL datasets shows that our approach of combining MPS and SCF yields further improvements in calibration and the related task of error detection over traditional Platt scaling.
2021
Learning Language and Multimodal Privacy-Preserving Markers of Mood from Mobile Data
Paul Pu Liang
|
Terrance Liu
|
Anna Cai
|
Michal Muszynski
|
Ryo Ishii
|
Nick Allen
|
Randy Auerbach
|
David Brent
|
Ruslan Salakhutdinov
|
Louis-Philippe Morency
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Mental health conditions remain underdiagnosed even in countries with common access to advanced medical care. The ability to accurately and efficiently predict mood from easily collectible data has several important implications for the early detection, intervention, and treatment of mental health disorders. One promising data source to help monitor human behavior is daily smartphone usage. However, care must be taken to summarize behaviors without identifying the user through personal (e.g., personally identifiable information) or protected (e.g., race, gender) attributes. In this paper, we study behavioral markers of daily mood using a recent dataset of mobile behaviors from adolescent populations at high risk of suicidal behaviors. Using computational models, we find that language and multimodal representations of mobile typed text (spanning typed characters, words, keystroke timings, and app usage) are predictive of daily mood. However, we find that models trained to predict mood often also capture private user identities in their intermediate representations. To tackle this problem, we evaluate approaches that obfuscate user identity while remaining predictive. By combining multimodal representations with privacy-preserving learning, we are able to push forward the performance-privacy frontier.
Search
Fix author
Co-authors
- Nick Allen 1
- Randy Auerbach 1
- David Brent 1
- Anna Cai 1
- Yash Chandarana 1
- show all...