Anupama Garani


2026

Retrieval-Augmented Generation (RAG) systems fail in diverse, poorly characterized ways that single-stage evaluation metrics cannot detect. We present a systematic taxonomy of 33 failure modes across 7 pipeline stages — ingestion, representation, retrieval, generation, evaluation, deployment, and agentic orchestration — constructed through a structured literature review of 48 sources spanning peer-reviewed publications and high-impact preprints. For each mode, we provide a formal definition, observable manifestation, and three-level evidence grading (Strong/Moderate/Limited). Our analysis reveals a critical asymmetry in research attention: retrieval and generation failures are comparatively well-studied, while representation, evaluation, and agentic orchestration failures remain under-investigated despite frequent occurrence in production. We identify 12 failure modes with no dedicated peer-reviewed empirical evidence — all 8 agentic modes among them — constituting an evidence desert in the fastest-growing RAG deployment paradigm. Compared to prior work enumerating 7 failure points (Barnett et al., 2024) or 16 error types within partial pipeline runs (Cresswell et al., 2025), our taxonomy uniquely spans the full pipeline, including agentic orchestration with explicit evidence-level grading.
Search
Co-authors
    Venues
    Fix author