AGIC: Attention-Guided Image Captioning to Improve Caption Relevance

L D M S Sai Teja, Ashok Urlana, Pruthwik Mishra


Abstract
Despite significant progress in image captioning, generating accurate and descriptive captions remains a long-standing challenge. In this study, we propose Attention-Guided Image Captioning (AGIC), which amplifies salient visual regions directly in the feature space to guide caption generation. We further introduce a hybrid decoding strategy that combines deterministic and probabilistic sampling to balance fluency and diversity. To evaluate AGIC, we conduct extensive experiments on the Flickr8k, Flickr30k and MSCOCO datasets. The results show that AGIC matches or surpasses several state-of-the-art models while achieving faster inference. Moreover, AGIC demonstrates strong performance across multiple evaluation metrics, offering a scalable and interpretable solution for image captioning.
Anthology ID:
2026.findings-eacl.342
Volume:
Findings of the Association for Computational Linguistics: EACL 2026
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6517–6528
Language:
URL:
https://preview.aclanthology.org/manual-author-scripts/2026.findings-eacl.342/
DOI:
Bibkey:
Cite (ACL):
L D M S Sai Teja, Ashok Urlana, and Pruthwik Mishra. 2026. AGIC: Attention-Guided Image Captioning to Improve Caption Relevance. In Findings of the Association for Computational Linguistics: EACL 2026, pages 6517–6528, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
AGIC: Attention-Guided Image Captioning to Improve Caption Relevance (Teja et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/manual-author-scripts/2026.findings-eacl.342.pdf
Checklist:
 2026.findings-eacl.342.checklist.pdf