AID-Agent: An LLM-Agent for Advanced Extraction and Integration of Documents

Bin Li, Jannis Conen, Felix Aller


Abstract
Extracting structured information from complex unstructured documents is an essential but challenging task in today’s industrial applications. Complex document content, e.g., irregular table layout, and cross-referencing, can lead to unexpected failures in classical extractors based on Optical Character Recognition (OCR) or Large Language Models (LLMs). In this paper, we propose the AID-agent framework that synergistically integrates OCR with LLMs to enhance text processing capabilities. Specifically, the AID-agent maintains a customizable toolset, which not only provides external processing tools for complex documents but also enables customization for domain and task-specific tool requirements. In the empirical validation on a real-world use case, the proposed AID-agent demonstrates superior performance compared to conventional OCR and LLM-based approaches.
Anthology ID:
2025.realm-1.6
Volume:
Proceedings of the 1st Workshop for Research on Agent Language Models (REALM 2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Ehsan Kamalloo, Nicolas Gontier, Xing Han Lu, Nouha Dziri, Shikhar Murty, Alexandre Lacoste
Venues:
REALM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
80–88
Language:
URL:
https://preview.aclanthology.org/transition-to-people-yaml/2025.realm-1.6/
DOI:
10.18653/v1/2025.realm-1.6
Bibkey:
Cite (ACL):
Bin Li, Jannis Conen, and Felix Aller. 2025. AID-Agent: An LLM-Agent for Advanced Extraction and Integration of Documents. In Proceedings of the 1st Workshop for Research on Agent Language Models (REALM 2025), pages 80–88, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
AID-Agent: An LLM-Agent for Advanced Extraction and Integration of Documents (Li et al., REALM 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/transition-to-people-yaml/2025.realm-1.6.pdf