On the (In)Effectiveness of Images for Text Classification

Chunpeng Ma, Aili Shen, Hiyori Yoshikawa, Tomoya Iwakura, Daniel Beck, Timothy Baldwin


Abstract
Images are core components of multi-modal learning in natural language processing (NLP), and results have varied substantially as to whether images improve NLP tasks or not. One confounding effect has been that previous NLP research has generally focused on sophisticated tasks (in varying settings), generally applied to English only. We focus on text classification, in the context of assigning named entity classes to a given Wikipedia page, where images generally complement the text and the Wikipedia page can be in one of a number of different languages. Our experiments across a range of languages show that images complement NLP models (including BERT) trained without external pre-training, but when combined with BERT models pre-trained on large-scale external data, images contribute nothing.
Anthology ID:
2021.eacl-main.4
Volume:
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
Month:
April
Year:
2021
Address:
Online
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
42–48
Language:
URL:
https://aclanthology.org/2021.eacl-main.4
DOI:
10.18653/v1/2021.eacl-main.4
Bibkey:
Cite (ACL):
Chunpeng Ma, Aili Shen, Hiyori Yoshikawa, Tomoya Iwakura, Daniel Beck, and Timothy Baldwin. 2021. On the (In)Effectiveness of Images for Text Classification. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 42–48, Online. Association for Computational Linguistics.
Cite (Informal):
On the (In)Effectiveness of Images for Text Classification (Ma et al., EACL 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2021.eacl-main.4.pdf
Data
VCRVisual Genome