@inproceedings{ayyubi-etal-2025-deloc,
title = "{DELOC}: Document Element Localizer",
author = "Ayyubi, Hammad and
Mathur, Puneet and
Tanjim, Mehrab and
Morariu, Vlad I",
editor = "Christodoulopoulos, Christos and
Chakraborty, Tanmoy and
Rose, Carolyn and
Peng, Violet",
booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1585/",
pages = "31126--31135",
ISBN = "979-8-89176-332-6",
abstract = "Editing documents and PDFs using natural language instructions is desirable for many reasons {--} ease of use, increasing accessibility to non-technical users, and for creativity. To do this automatically, a system needs to first understand the user{'}s intent and convert this to an executable plan or command, and then the system needs to identify or localize the elements that the user desires to edit. While there exist methods that can accomplish these tasks, a major bottleneck in these systems is the inability to ground the spatial edit location effectively. We address this gap through our proposed system, DELOC (Document Element LOCalizer). DELOC adapts the grounding capabilities of existing Multimodal Large Language Model (MLLM) from natural images to PDFs. This adaptation involves two novel contributions: 1) synthetically generating PDF-grounding instruction tuning data from partially annotated datasets; and 2) synthetic data cleaning via Code-NLI, an NLI-inspired process to clean data using generated Python code. The effectiveness of DELOC is apparent in the {\ensuremath{>}}3x zero-shot improvement it achieves over the next best Multimodal LLM, GPT-4o."
}Markdown (Informal)
[DELOC: Document Element Localizer](https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1585/) (Ayyubi et al., EMNLP 2025)
ACL
- Hammad Ayyubi, Puneet Mathur, Mehrab Tanjim, and Vlad I Morariu. 2025. DELOC: Document Element Localizer. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 31126–31135, Suzhou, China. Association for Computational Linguistics.