DELOC: Document Element Localizer

Hammad Ayyubi, Puneet Mathur, Mehrab Tanjim, Vlad I Morariu


Abstract
Editing documents and PDFs using natural language instructions is desirable for many reasons – ease of use, increasing accessibility to non-technical users, and for creativity. To do this automatically, a system needs to first understand the user’s intent and convert this to an executable plan or command, and then the system needs to identify or localize the elements that the user desires to edit. While there exist methods that can accomplish these tasks, a major bottleneck in these systems is the inability to ground the spatial edit location effectively. We address this gap through our proposed system, DELOC (Document Element LOCalizer). DELOC adapts the grounding capabilities of existing Multimodal Large Language Model (MLLM) from natural images to PDFs. This adaptation involves two novel contributions: 1) synthetically generating PDF-grounding instruction tuning data from partially annotated datasets; and 2) synthetic data cleaning via Code-NLI, an NLI-inspired process to clean data using generated Python code. The effectiveness of DELOC is apparent in the >3x zero-shot improvement it achieves over the next best Multimodal LLM, GPT-4o.
Anthology ID:
2025.emnlp-main.1585
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
31126–31135
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1585/
DOI:
Bibkey:
Cite (ACL):
Hammad Ayyubi, Puneet Mathur, Mehrab Tanjim, and Vlad I Morariu. 2025. DELOC: Document Element Localizer. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 31126–31135, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
DELOC: Document Element Localizer (Ayyubi et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1585.pdf
Checklist:
 2025.emnlp-main.1585.checklist.pdf