Abstract
Situation recognition is the task of recognizing the activity depictedin an image, including the people and objects involved. Previousmodels for this task typically train a classifier to identify theactivity using a backbone image feature extractor. We propose thatcommonsense knowledge about the objects depicted in an image can alsobe a valuable source of information for activity identification. Previous NLP research has argued that knowledge about the prototypicalfunctions of physical objects is important for language understanding,and NLP techniques have been developed to acquire this knowledge. Our work investigates whether this prototypical function knowledgecan also be beneficial for visual situation recognition. Webuild a framework that incorporates this type of commonsense knowledgein a transformer-based model that is trained to predict the actionverb for situation recognition. Our experimental results show thatadding prototypical function knowledge about physical objects doesimprove performance for the visual activity recognition task.- Anthology ID:
- 2023.findings-acl.457
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2023
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 7277–7285
- Language:
- URL:
- https://aclanthology.org/2023.findings-acl.457
- DOI:
- 10.18653/v1/2023.findings-acl.457
- Cite (ACL):
- Tianyu Jiang and Ellen Riloff. 2023. Exploiting Commonsense Knowledge about Objects for Visual Activity Recognition. In Findings of the Association for Computational Linguistics: ACL 2023, pages 7277–7285, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Exploiting Commonsense Knowledge about Objects for Visual Activity Recognition (Jiang & Riloff, Findings 2023)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2023.findings-acl.457.pdf