One More Modality: Does Abstract Meaning Representation Benefit Visual Question Answering?

Abhidip Bhattacharyya; Emma Markle; Shira Wein

doi:10.18653/v1/2025.findings-emnlp.82

One More Modality: Does Abstract Meaning Representation Benefit Visual Question Answering?

Abhidip Bhattacharyya, Emma Markle, Shira Wein

Abstract

Visual Question Answering (VQA) requires a vision-language model to reason over both visual and textual inputs to answer questions about images. In this work, we investigate whether incorporating explicit semantic information, in the form of Abstract Meaning Representation (AMR) graphs, can enhance model performance—particularly in low-resource settings where training data is limited. We augment two vision-language models, LXMERT and BLIP-2, with sentence- and document-level AMRs and evaluate their performance under both full and reduced training data conditions. Our findings show that in well-resourced settings, models (in particular the smaller LXMERT) are negatively impacted by incorporating AMR without specialized training. However, in low-resource settings, AMR proves beneficial: LXMERT achieves up to a 13.1% relative gain using sentence-level AMRs. These results suggest that while addition of AMR can lower the performance in some settings, in a low-resource setting AMR can serve as a useful semantic prior, especially for lower-capacity models trained on limited data.

Anthology ID:: 2025.findings-emnlp.82
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1560–1572
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.82/
DOI:: 10.18653/v1/2025.findings-emnlp.82
Bibkey:
Cite (ACL):: Abhidip Bhattacharyya, Emma Markle, and Shira Wein. 2025. One More Modality: Does Abstract Meaning Representation Benefit Visual Question Answering?. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 1560–1572, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: One More Modality: Does Abstract Meaning Representation Benefit Visual Question Answering? (Bhattacharyya et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.82.pdf
Checklist:: 2025.findings-emnlp.82.checklist.pdf

PDF Cite Search Checklist Fix data