DFAT: Dual-stage Fusion of Acoustic and Text feature for Speech Emotion Recognition

Nhi Nguyen Yen Truong, Sang Le Quang, Huy Tran Quang, Tri Pham Xuan, Duong Tran Ham, Binh Tran Le Hai, Tin Huynh, Kiem Hoang


Anthology ID:
2025.vlsp-1.6
Volume:
Proceedings of the 11th International Workshop on Vietnamese Language and Speech Processing
Month:
October
Year:
2025
Address:
Hanoi, Vietnam
Editors:
Luong Chi Mai, Nguyen Thi Minh Huyen, Nguyen Thi Thu Trang
Venues:
VLSP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
36–44
Language:
URL:
https://preview.aclanthology.org/author-page-lei-gao-usc/2025.vlsp-1.6/
DOI:
Bibkey:
Cite (ACL):
Nhi Nguyen Yen Truong, Sang Le Quang, Huy Tran Quang, Tri Pham Xuan, Duong Tran Ham, Binh Tran Le Hai, Tin Huynh, and Kiem Hoang. 2025. DFAT: Dual-stage Fusion of Acoustic and Text feature for Speech Emotion Recognition. In Proceedings of the 11th International Workshop on Vietnamese Language and Speech Processing, pages 36–44, Hanoi, Vietnam. Association for Computational Linguistics.
Cite (Informal):
DFAT: Dual-stage Fusion of Acoustic and Text feature for Speech Emotion Recognition (Truong et al., VLSP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-lei-gao-usc/2025.vlsp-1.6.pdf