SchemaRAG: Dynamic Large Schema Reduction for LLM-driven Structured Information Extraction

Sin Yu Bonnie Ho, Arlie Coles, Erik Larsson, Eric Marshall, Nathan Bodenstab, Paul Vozila


Abstract
Extracting structured data from unstructured text using large language models (LLMs) becomes challenging when the target schemas are large and complex. In such cases, including the full schema in the prompt increases cost and latency, risks lost-in-the-middle performance degradation, and can exceed context length limits. We propose SchemaRAG, a retrieval-augmented generation (RAG) framework that dynamically prunes the output schema space for schema-conditioned information extraction tasks by leveraging schema metadata and few-shot examples (when available). We evaluate SchemaRAG on real-world healthcare and e-commerce datasets. Our results show that SchemaRAG can achieve up to an 8.8% increase in micro-F1, a 47% reduction in latency, and a 48% reduction in token costs, demonstrating its practicality for large-schema extraction.
Anthology ID:
2026.acl-industry.78
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Yunyao Li, Georg Rehm, Mei Tu
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1114–1127
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-industry.78/
DOI:
Bibkey:
Cite (ACL):
Sin Yu Bonnie Ho, Arlie Coles, Erik Larsson, Eric Marshall, Nathan Bodenstab, and Paul Vozila. 2026. SchemaRAG: Dynamic Large Schema Reduction for LLM-driven Structured Information Extraction. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), pages 1114–1127, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
SchemaRAG: Dynamic Large Schema Reduction for LLM-driven Structured Information Extraction (Ho et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-industry.78.pdf