Image Caption Generation Framework for Assamese News using Attention Mechanism

Ringki Das, Thoudam Doren Singh


Abstract
Automatic caption generation is an artificial intelligence problem that falls at the intersection of computer vision and natural language processing. Although significant works have been reported in image captioning, the contribution is limited to English and few major languages with sufficient resources. But, no work on image captioning has been reported in a resource-constrained language like Assamese. With this inspiration, we propose an encoder-decoder based framework for image caption generation in the Assamese news domain. The VGG-16 pre-trained model at the encoder side and LSTM with an attention mechanism are employed at the decoder side to generate the Assamese caption. We train the proposed model on the dataset built in-house consisting of 10,000 images with a single caption for each image. We describe our experimental methodology, quantitative and qualitative results which validate the effectiveness of our model for caption generation. The proposed model shows a BLEU score of 12.1 outperforming the baseline model.
Anthology ID:
2021.icon-main.28
Volume:
Proceedings of the 18th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2021
Address:
National Institute of Technology Silchar, Silchar, India
Editors:
Sivaji Bandyopadhyay, Sobha Lalitha Devi, Pushpak Bhattacharyya
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
231–239
Language:
URL:
https://aclanthology.org/2021.icon-main.28
DOI:
Bibkey:
Cite (ACL):
Ringki Das and Thoudam Doren Singh. 2021. Image Caption Generation Framework for Assamese News using Attention Mechanism. In Proceedings of the 18th International Conference on Natural Language Processing (ICON), pages 231–239, National Institute of Technology Silchar, Silchar, India. NLP Association of India (NLPAI).
Cite (Informal):
Image Caption Generation Framework for Assamese News using Attention Mechanism (Das & Singh, ICON 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp22-frontmatter/2021.icon-main.28.pdf
Data
BanglaLekhaImageCaptionsMS COCOVisual Genome