Abstract
Multimodal learning is generally expected to make more accurate predictions than text-only analysis. Here, although various methods for fusing multimodal inputs have been proposed for sentiment analysis tasks, we found that they may be inhibiting their fusion methods, which are based on attention-based language models, from learning non-verbal modalities, because non-verbal ones are isolated from the linguistic semantics and contexts and do not include them, meaning that they are unsuitable for applying attention to text modalities during the fusion phase. To address this issue, we propose Word-aware Modality Stimulation Fusion (WA-MSF) for facilitating integration of non-verbal modalities with the text modality. The Modality Stimulation Unit layer (MSU-layer) is the core concept of WA-MSF; it integrates language contexts and semantics into non-verbal modalities, thereby instilling linguistic essence into these modalities. Moreover, WA-MSF uses aMLP in the fusion phase in order to utilize spatial and temporal representations of non-verbal modalities more effectively than transformer fusion. In our experiments, WA-MSF set a new state-of-the-art level of performance on sentiment prediction tasks.- Anthology ID:
- 2024.lrec-main.1536
- Original:
- 2024.lrec-main.1536v1
- Version 2:
- 2024.lrec-main.1536v2
- Volume:
- Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
- Venues:
- LREC | COLING
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 17664–17674
- Language:
- URL:
- https://aclanthology.org/2024.lrec-main.1536
- DOI:
- Cite (ACL):
- Shuhei Tateishi, Yasuhito Osugi, and Makoto Nakatsuji. 2024. Word-Aware Modality Stimulation for Multimodal Fusion. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 17664–17674, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- Word-Aware Modality Stimulation for Multimodal Fusion (Tateishi et al., LREC-COLING 2024)
- PDF:
- https://preview.aclanthology.org/add_acl24_videos/2024.lrec-main.1536.pdf