Dual-Path Dynamic Fusion with Learnable Query for Multimodal Sentiment Analysis

Miao Zhou, Lina Yang, Thomas Wu, Dongnan Yang, Xinru Zhang


Abstract
Multimodal Sentiment Analysis (MSA) is the task of understanding human emotions by analyzing a combination of different data sources, such as text, audio, and visual inputs. Although recent advances have improved emotion modeling across modalities, existing methods still struggle with two fundamental challenges: balancing global and fine-grained sentiment contributions, and over-reliance on the text modality. To address these issues, we propose DPDF-LQ (Dual-Path Dynamic Fusion with Learnable Query), an architecture that processes inputs through two complementary paths: global and local. The global path is responsible for establishing cross-modal dependencies, while the local path captures fine-grained representations. Additionally, we introduce the key module Dynamic Global Learnable Query Attention (DGLQA) in the global path, which dynamically allocates weights to each modality to capture their relevant features and learn global representations. Extensive experiments on the CMU-MOSI and CMU-MOSEI benchmarks demonstrate that DPDF-LQ achieves state-of-the-art performance, particularly in fine-grained sentiment prediction by effectively combining global and local features. Our code will be released at https://github.com/ZhouMiaoGX/DPDF-LQ.
Anthology ID:
2025.emnlp-main.571
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11366–11376
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.571/
DOI:
Bibkey:
Cite (ACL):
Miao Zhou, Lina Yang, Thomas Wu, Dongnan Yang, and Xinru Zhang. 2025. Dual-Path Dynamic Fusion with Learnable Query for Multimodal Sentiment Analysis. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 11366–11376, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Dual-Path Dynamic Fusion with Learnable Query for Multimodal Sentiment Analysis (Zhou et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.571.pdf
Checklist:
 2025.emnlp-main.571.checklist.pdf