A Textual Modal Supplement Framework for Understanding Multi-Modal Figurative Language

Jiale Chen; Qihao Yang; Xuelian Dong; Xiaoling Mao; Tianyong Hao (郝天永)

A Textual Modal Supplement Framework for Understanding Multi-Modal Figurative Language

Jiale Chen, Qihao Yang, Xuelian Dong, Xiaoling Mao, Tianyong Hao

Abstract

Figurative language in media such as memes, art, or comics has gained dramatic interest recently. However, the challenge remains in accurately justifying and explaining whether an image caption complements or contradicts the image it accompanies. To tackle this problem, we design a modal-supplement framework MAPPER consisting of a describer and thinker. The describer based on a frozen large vision model is designed to describe an image in detail to capture entailed semantic information. The thinker based on a finetuned large multi-modal model is designed to utilize description, claim and image to make prediction and explanation. Experiment results on a publicly available benchmark dataset from FigLang2024 Task 2 show that our method ranks at top 1 in overall evaluation, the performance exceeds the second place by 28.57%. This indicates that MAPPER is highly effective in understanding, judging and explaining of the figurative language. The source code is available at https://github.com/Libv-Team/figlang2024.

Anthology ID:: 2024.figlang-1.12
Volume:: Proceedings of the 4th Workshop on Figurative Language Processing (FigLang 2024)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico (Hybrid)
Editors:: Debanjan Ghosh, Smaranda Muresan, Anna Feldman, Tuhin Chakrabarty, Emmy Liu
Venues:: Fig-Lang | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 85–91
Language:
URL:: https://aclanthology.org/2024.figlang-1.12
DOI:
Bibkey:
Cite (ACL):: Jiale Chen, Qihao Yang, Xuelian Dong, Xiaoling Mao, and Tianyong Hao. 2024. A Textual Modal Supplement Framework for Understanding Multi-Modal Figurative Language. In Proceedings of the 4th Workshop on Figurative Language Processing (FigLang 2024), pages 85–91, Mexico City, Mexico (Hybrid). Association for Computational Linguistics.
Cite (Informal):: A Textual Modal Supplement Framework for Understanding Multi-Modal Figurative Language (Chen et al., Fig-Lang-WS 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/jeptaln-2024-ingestion/2024.figlang-1.12.pdf

PDF Search