EcoDoc: A Cost-Efficient Multimodal Document Processing System for Enterprises Using LLMs

Ravi K. Rajendran, Biplob Debnath, Murugan Sankaradass, Srimat Chakradhar


Abstract
Enterprises are increasingly adopting Generative AI applications to extract insights from large volumes of multimodal documents in domains such as finance, law, healthcare, and industry. These documents contain structured and unstructured data (images, charts, handwritten texts, etc.) requiring robust AI systems for effective retrieval and comprehension. Recent advancements in Retrieval-Augmented Generation (RAG) frameworks and Vision-Language Models (VLMs) have improved retrieval performance on multimodal documents by processing pages as images. However, large-scale deployment remains challenging due to the high cost of LLM API usage and the slower inference speed of image-based processing of pages compared to text-based processing. To address these challenges, we propose EcoDoc, a cost-effective multimodal document processing system that dynamically selects the processing modalities for each page as an image or text based on page characteristics and query intent. Our experimental evaluation on TAT-DQA and DocVQA benchmarks shows that EcoDoc reduces average query processing latency by up to 2.29× and cost by up to 10×, without compromising accuracy.
Anthology ID:
2025.acl-industry.109
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Georg Rehm, Yunyao Li
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1530–1537
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.acl-industry.109/
DOI:
Bibkey:
Cite (ACL):
Ravi K. Rajendran, Biplob Debnath, Murugan Sankaradass, and Srimat Chakradhar. 2025. EcoDoc: A Cost-Efficient Multimodal Document Processing System for Enterprises Using LLMs. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), pages 1530–1537, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
EcoDoc: A Cost-Efficient Multimodal Document Processing System for Enterprises Using LLMs (Rajendran et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.acl-industry.109.pdf