Visual Attention Model for Name Tagging in Multimodal Social Media

Di Lu; Leonardo Neves; Vitor Carvalho; Ning Zhang; Heng Ji

doi:10.18653/v1/P18-1185

Visual Attention Model for Name Tagging in Multimodal Social Media

Di Lu, Leonardo Neves, Vitor Carvalho, Ning Zhang, Heng Ji

Abstract

Everyday billions of multimodal posts containing both images and text are shared in social media sites such as Snapchat, Twitter or Instagram. This combination of image and text in a single message allows for more creative and expressive forms of communication, and has become increasingly common in such sites. This new paradigm brings new challenges for natural language understanding, as the textual component tends to be shorter, more informal, and often is only understood if combined with the visual context. In this paper, we explore the task of name tagging in multimodal social media posts. We start by creating two new multimodal datasets: the first based on Twitter posts and the second based on Snapchat captions (exclusively submitted to public and crowd-sourced stories). We then propose a novel model architecture based on Visual Attention that not only provides deeper visual understanding on the decisions of the model, but also significantly outperforms other state-of-the-art baseline methods for this task.

Anthology ID:: P18-1185
Volume:: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2018
Address:: Melbourne, Australia
Editors:: Iryna Gurevych, Yusuke Miyao
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1990–1999
Language:
URL:: https://aclanthology.org/P18-1185
DOI:: 10.18653/v1/P18-1185
Bibkey:
Cite (ACL):: Di Lu, Leonardo Neves, Vitor Carvalho, Ning Zhang, and Heng Ji. 2018. Visual Attention Model for Name Tagging in Multimodal Social Media. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1990–1999, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):: Visual Attention Model for Name Tagging in Multimodal Social Media (Lu et al., ACL 2018)
Copy Citation:
PDF:: https://preview.aclanthology.org/dois-2013-emnlp/P18-1185.pdf
Presentation:: P18-1185.Presentation.pdf
Video:: https://preview.aclanthology.org/dois-2013-emnlp/P18-1185.mp4

PDF Search Presentation Video