MM-GATBT: Enriching Multimodal Representation Using Graph Attention Network

Seung Byum Seo; Hyoungwook Nam; Payam Delgosha

doi:10.18653/v1/2022.naacl-srw.14

MM-GATBT: Enriching Multimodal Representation Using Graph Attention Network

Seung Byum Seo, Hyoungwook Nam, Payam Delgosha

Abstract

While there have been advances in Natural Language Processing (NLP), their success is mainly gained by applying a self-attention mechanism into single or multi-modalities. While this approach has brought significant improvements in multiple downstream tasks, it fails to capture the interaction between different entities. Therefore, we propose MM-GATBT, a multimodal graph representation learning model that captures not only the relational semantics within one modality but also the interactions between different modalities. Specifically, the proposed method constructs image-based node embedding which contains relational semantics of entities. Our empirical results show that MM-GATBT achieves state-of-the-art results among all published papers on the MM-IMDb dataset.

Anthology ID:: 2022.naacl-srw.14
Volume:: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop
Month:: July
Year:: 2022
Address:: Hybrid: Seattle, Washington + Online
Editors:: Daphne Ippolito, Liunian Harold Li, Maria Leonor Pacheco, Danqi Chen, Nianwen Xue
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 106–112
Language:
URL:: https://aclanthology.org/2022.naacl-srw.14
DOI:: 10.18653/v1/2022.naacl-srw.14
Bibkey:
Cite (ACL):: Seung Byum Seo, Hyoungwook Nam, and Payam Delgosha. 2022. MM-GATBT: Enriching Multimodal Representation Using Graph Attention Network. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, pages 106–112, Hybrid: Seattle, Washington + Online. Association for Computational Linguistics.
Cite (Informal):: MM-GATBT: Enriching Multimodal Representation Using Graph Attention Network (Seo et al., NAACL 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-3/2022.naacl-srw.14.pdf
Video:: https://preview.aclanthology.org/nschneid-patch-3/2022.naacl-srw.14.mp4
Code: sbseo/mm-gatbt

PDF Search Code Video