Christoffer Heckman
2022
Aligning Images and Text with Semantic Role Labels for Fine-Grained Cross-Modal Understanding
Abhidip Bhattacharyya
|
Cecilia Mauceri
|
Martha Palmer
|
Christoffer Heckman
Proceedings of the Thirteenth Language Resources and Evaluation Conference
As vision processing and natural language processing continue to advance, there is increasing interest in multimodal applications, such as image retrieval, caption generation, and human-robot interaction. These tasks require close alignment between the information in the images and text. In this paper, we present a new multimodal dataset that combines state of the art semantic annotation for language with the bounding boxes of corresponding images. This richer multimodal labeling supports cross-modal inference for applications in which such alignment is useful. Our semantic representations, developed in the natural language processing community, abstract away from the surface structure of the sentence, focusing on specific actions and the roles of their participants, a level that is equally relevant to images. We then utilize these representations in the form of semantic role labels in the captions and the images and demonstrate improvements in standard tasks such as image retrieval. The potential contributions of these additional labels is evaluated using a role-aware retrieval system based on graph convolutional and recurrent neural networks. The addition of semantic roles into this system provides a significant increase in capability and greater flexibility for these tasks, and could be extended to state-of-the-art techniques relying on transformers with larger amounts of annotated data.
2020
Leveraging Non-Specialists for Accurate and Time Efficient AMR Annotation
Mary Martin
|
Cecilia Mauceri
|
Martha Palmer
|
Christoffer Heckman
Proceedings of the LREC 2020 Workshop on "Citizen Linguistics in Language Resource Development"
Abstract Meaning Representations (AMRs), a syntax-free representation of phrase semantics are useful for capturing the meaning of a phrase and reflecting the relationship between concepts that are referred to. However, annotating AMRs are time consuming and expensive. The existing annotation process requires expertly trained workers who have knowledge of an extensive set of guidelines for parsing phrases. In this paper, we propose a cost-saving two-step process for the creation of a corpus of AMR-phrase pairs for spatial referring expressions. The first step uses non-specialists to perform simple annotations that can be leveraged in the second step to accelerate the annotation performed by the experts. We hypothesize that our process will decrease the cost per annotation and improve consistency across annotators. Few corpora of spatial referring expressions exist and the resulting language resource will be valuable for referring expression comprehension and generation modeling.
Search