Jesse C. Cresswell

2026

Classifying and Addressing the Diversity of Errors in Retrieval-Augmented Generation Systems
Kin Kwan Leung | Mouloud Belbahri | Yi Sui | Alex Labach | Xueying Zhang | Stephen Anthony Rose | Jesse C. Cresswell
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Retrieval-augmented generation (RAG) is a prevalent approach for building LLM-based question-answering systems that can take advantage of external knowledge databases. Due to the complexity of real-world RAG systems, there are many potential causes for erroneous outputs. Understanding the range of errors that can occur in practice is crucial for robust deployment. We present a new taxonomy of the error types that can occur in realistic RAG systems, examples of each, and practical advice for addressing them. Additionally, we curate a dataset of erroneous RAG responses annotated by error types. We then propose an auto-evaluation method aligned with our taxonomy that can be used in practice to track and address errors during development. Code and data are available at https://github.com/layer6ai-labs/rag-error-classification.

2025

pdf bib abs

MSc-SQL: Multi-Sample Critiquing Small Language Models For Text-To-SQL Translation
Satya Krishna Gorti | Ilan Gofman | Zhaoyan Liu | Jiapeng Wu | Noël Vouitsis | Guangwei Yu | Jesse C. Cresswell | Rasa Hosseinzadeh
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Text-to-SQL generation enables non-experts to interact with databases via natural language. Recent advances rely on large closed-source models like GPT-4 that present challenges in accessibility, privacy, and latency. To address these issues, we focus on developing small, efficient, and open-source text-to-SQL models. We demonstrate the benefits of sampling multiple candidate SQL generations and propose our method, MSc-SQL, to critique them using associated metadata. Our sample critiquing model evaluates multiple outputs simultaneously, achieving state-of-the-art performance compared to other open-source models while remaining competitive with larger models at a much lower cost. Full code can be found at github.com/layer6ai-labs/msc-sql.

Co-authors

Kin Kwan Leung 1

Zhaoyan Liu 1

Stephen Anthony Rose 1

Yi Sui 1

Venues

EACL1
NAACL1

Fix author