Exploiting Commonsense Knowledge about Objects for Visual Activity Recognition

Tianyu Jiang, Ellen Riloff


Abstract
Situation recognition is the task of recognizing the activity depictedin an image, including the people and objects involved. Previousmodels for this task typically train a classifier to identify theactivity using a backbone image feature extractor. We propose thatcommonsense knowledge about the objects depicted in an image can alsobe a valuable source of information for activity identification. Previous NLP research has argued that knowledge about the prototypicalfunctions of physical objects is important for language understanding,and NLP techniques have been developed to acquire this knowledge. Our work investigates whether this prototypical function knowledgecan also be beneficial for visual situation recognition. Webuild a framework that incorporates this type of commonsense knowledgein a transformer-based model that is trained to predict the actionverb for situation recognition. Our experimental results show thatadding prototypical function knowledge about physical objects doesimprove performance for the visual activity recognition task.
Anthology ID:
2023.findings-acl.457
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7277–7285
Language:
URL:
https://aclanthology.org/2023.findings-acl.457
DOI:
10.18653/v1/2023.findings-acl.457
Bibkey:
Cite (ACL):
Tianyu Jiang and Ellen Riloff. 2023. Exploiting Commonsense Knowledge about Objects for Visual Activity Recognition. In Findings of the Association for Computational Linguistics: ACL 2023, pages 7277–7285, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Exploiting Commonsense Knowledge about Objects for Visual Activity Recognition (Jiang & Riloff, Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/dois-2013-emnlp/2023.findings-acl.457.pdf