Talk to Your Slides: High-Efficiency Slide Editing via Language-Driven Structured Data Manipulation

Kyudan Jung, Hojun Cho, Jooyeol Yun, Soyoung Yang, Jaehyeok Jang, Jaegul Choo


Abstract
Editing presentation slides is a frequent yet tedious task, ranging from creative layout design to repetitive text maintenance. While recent GUI-based agents powered by Multimodal LLMs (MLLMs) excel at tasks requiring visual perception, such as spatial layout adjustments, they often incur high computational costs and latency when handling structured, text-centric, or batch processing tasks. In this paper, we propose Talk-to-Your-Slides, a high-efficiency slide editing agent that operates via language-driven structured data manipulation rather than relying on the image modality. By leveraging the underlying object model instead of screen pixels, our approach ensures precise content modification while preserving style fidelity, addressing the limitations of OCR-based visual agents. Our system features a hierarchical architecture that effectively bridges high-level user instructions with low-level execution codes. Experiments demonstrate that for text-centric and formatting tasks, our method enables 34% faster processing, achieves 34% better instruction fidelity, and operates at an 87% lower cost compared to GUI-based baselines. Furthermore, we introduce TSBench, a human-verified benchmark dataset comprising 379 instructions, including a Hard subset designed to evaluate robustness against complex and visually dependent queries. Our code and benchmark are available at https://drive.google.com/drive/folders/1onwp5m7t3207xZu7HEBTMpdivsiOuqG8?usp=share_link
Anthology ID:
2026.findings-acl.166
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3370–3399
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.166/
DOI:
Bibkey:
Cite (ACL):
Kyudan Jung, Hojun Cho, Jooyeol Yun, Soyoung Yang, Jaehyeok Jang, and Jaegul Choo. 2026. Talk to Your Slides: High-Efficiency Slide Editing via Language-Driven Structured Data Manipulation. In Findings of the Association for Computational Linguistics: ACL 2026, pages 3370–3399, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Talk to Your Slides: High-Efficiency Slide Editing via Language-Driven Structured Data Manipulation (Jung et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.166.pdf
Checklist:
 2026.findings-acl.166.checklist.pdf