CapEEN: Image Captioning with Early Exits and Knowledge Distillation

Divya Jyoti Bajpai, Manjesh Kumar Hanawal


Abstract
Deep neural networks (DNNs) have made significant progress in recognizing visual elements and generating descriptive text in image-captioning tasks. However, their improved performance comes from increased computational burden and inference latency. Early Exit (EE) strategies can be used to enhance their efficiency, but their adaptation presents challenges in image captioning as it requires varying levels of semantic information for accurate predictions. To overcome this, we introduce CapEEN to improve the performance of EE strategies using knowledge distillation. Inference in CapEEN is completed at intermediary layers if prediction confidence exceeds a predefined value learned from the training data. To account for real-world deployments, where target distributions could drift from that of training samples, we introduce a variant A-CapEEN to adapt the thresholds on the fly using Multi-armed bandits framework. Experiments on the MS COCO and Flickr30k datasets show that CapEEN gains speedup of 1.77× while maintaining competitive performance compared to the final layer, and A-CapEEN additionally offers robustness against distortions. The source code is available at https://github.com/Div290/CapEEN.
Anthology ID:
2024.findings-emnlp.376
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6458–6472
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2024.findings-emnlp.376/
DOI:
10.18653/v1/2024.findings-emnlp.376
Bibkey:
Cite (ACL):
Divya Jyoti Bajpai and Manjesh Kumar Hanawal. 2024. CapEEN: Image Captioning with Early Exits and Knowledge Distillation. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 6458–6472, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
CapEEN: Image Captioning with Early Exits and Knowledge Distillation (Bajpai & Hanawal, Findings 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2024.findings-emnlp.376.pdf