Felix Aller


2025

Extracting structured information from complex unstructured documents is an essential but challenging task in today’s industrial applications. Complex document content, e.g., irregular table layout, and cross-referencing, can lead to unexpected failures in classical extractors based on Optical Character Recognition (OCR) or Large Language Models (LLMs). In this paper, we propose the AID-agent framework that synergistically integrates OCR with LLMs to enhance text processing capabilities. Specifically, the AID-agent maintains a customizable toolset, which not only provides external processing tools for complex documents but also enables customization for domain and task-specific tool requirements. In the empirical validation on a real-world use case, the proposed AID-agent demonstrates superior performance compared to conventional OCR and LLM-based approaches.