Deconstructing multimodality: visual properties and visual context in human semantic processing

Christopher Davis, Luana Bulat, Anita Lilla Vero, Ekaterina Shutova


Abstract
Multimodal semantic models that extend linguistic representations with additional perceptual input have proved successful in a range of natural language processing (NLP) tasks. Recent research has successfully used neural methods to automatically create visual representations for words. However, these works have extracted visual features from complete images, and have not examined how different kinds of visual information impact performance. In contrast, we construct multimodal models that differentiate between internal visual properties of the objects and their external visual context. We evaluate the models on the task of decoding brain activity associated with the meanings of nouns, demonstrating their advantage over those based on complete images.
Anthology ID:
S19-1013
Volume:
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Venues:
SemEval | *SEM
SIGs:
SIGSEM | SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
118–124
Language:
URL:
https://aclanthology.org/S19-1013
DOI:
10.18653/v1/S19-1013
Bibkey:
Cite (ACL):
Christopher Davis, Luana Bulat, Anita Lilla Vero, and Ekaterina Shutova. 2019. Deconstructing multimodality: visual properties and visual context in human semantic processing. In Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019), pages 118–124, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):
Deconstructing multimodality: visual properties and visual context in human semantic processing (Davis et al., SemEval-*SEM 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/nodalida-main-page/S19-1013.pdf
Data
ImageNetVisual Genome