Yijun Ge

2025

pdf bib abs
QuackIR: Retrieval in DuckDB and Other Relational Database Management Systems
Yijun Ge | Zijian Chen | Jimmy Lin
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Enterprises today are increasingly compelled to adopt dedicated vector databases for retrieval-augmented generation (RAG) in applications based on large language models (LLMs).As a potential alternative for these vector databases, we propose that organizations leverage existing relational databases for retrieval, which many have already deployed in their enterprise data lakes, thus minimizing additional complexity in their software stacks.To demonstrate the simplicity and feasibility of this approach, we present QuackIR, an information retrieval (IR) toolkit built on relational database management systems (RDBMSes), with integrations in DuckDB, SQLite, and PostgreSQL. Using QuackIR, we benchmark the sparse and dense retrieval capabilities of these popular RDBMSes and demonstrate that their effectiveness is comparable to baselines from established IR toolkits. Our results highlight the potential of relational databases as a simple option for RAG scenarios due to their established widespread usage and the easy integration of retrieval abilities. Our implementation is available at quackir.io.

Co-authors

Zijian Chen 1
Jimmy Lin 1

Venues

emnlp1

Fix data

Yijun Ge

Fixing paper assignments

2025

Co-authors

Venues