Decompose, Retrieve, Cite: A RAG Pipeline for Structured Report Generation from Technical Documentation
Himanshu Dhurve, Sreedath Panat, Rajat Dandekar, Raj Dandekar
Abstract
Retrieval-Augmented Generation (RAG) grounds language-model output in external knowledge, yet its application to dense technical documentation remains largely unexplored. Engineering software manuals pose compounding challenges: formulae are corrupted during PDF extraction, heterogeneous content types require different parsing treatment, and queries demand cross-document synthesis across multiple reference volumes.We present an end-to-end RAG system for OpenFOAM, an open-source computational fluid dynamics toolkit, operating in two modes. In single-query mode, a formula-preserving parser (Marker), adaptive header-aware chunking, two-stage dense-then-rerank retrieval, and a citation-enforcement prompt produce grounded, source-attributed answers across a 20-question benchmark.In report mode, a user prompt is decomposed into sub-questions via LLM planning; each sub-question undergoes independent retrieval and cross-encoder re-ranking, and the deduplicated chunk set is passed to a long-context generation call that produces a structured, multi-section report with inline citations.Evaluated on a 10-prompt golden set with a six-dimension LLM-as-a-judge framework, both pipelines achieve overall scores above 4.6/5.0 with perfect citation correctness (5.0/5.0). The decomposed pipeline demonstrates superior robustness (90% vs 70% judge success rate). Retrieval analysis using page-level ground truth reveals low absolute recall (<14%), identifying retrieval breadth as the primary bottleneck.- Anthology ID:
- 2026.rag4reports-1.4
- Volume:
- Proceedings of the 1st Workshop on Multilingual Report Generation via Retrieval Augmented Generation (RAG4Reports 2026)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, CA, USA
- Editors:
- Eugene Yang, Dawn Lawrie, Sean MacAvaney, James Mayfield, Luca Soldaini, Andrew Yates
- Venues:
- RAG4Reports | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 24–35
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.rag4reports-1.4/
- DOI:
- Cite (ACL):
- Himanshu Dhurve, Sreedath Panat, Rajat Dandekar, and Raj Dandekar. 2026. Decompose, Retrieve, Cite: A RAG Pipeline for Structured Report Generation from Technical Documentation. In Proceedings of the 1st Workshop on Multilingual Report Generation via Retrieval Augmented Generation (RAG4Reports 2026), pages 24–35, San Diego, CA, USA. Association for Computational Linguistics.
- Cite (Informal):
- Decompose, Retrieve, Cite: A RAG Pipeline for Structured Report Generation from Technical Documentation (Dhurve et al., RAG4Reports 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.rag4reports-1.4.pdf