Modality-specific Learning Rates for Effective Multimodal Additive Late-fusion

Yiqun Yao, Rada Mihalcea


Abstract
In multimodal machine learning, additive late-fusion is a straightforward approach to combine the feature representations from different modalities, in which the final prediction can be formulated as the sum of unimodal predictions. While it has been found that certain late-fusion models can achieve competitive performance with lower computational costs compared to complex multimodal interactive models, how to effectively search for a good late-fusion model is still an open question. Moreover, for different modalities, the best unimodal models may work under significantly different learning rates due to the nature of the modality and the computational flow of the model; thus, selecting a global learning rate for late-fusion models can result in a vanishing gradient for some modalities. To help address these issues, we propose a Modality-Specific Learning Rate (MSLR) method to effectively build late-fusion multimodal models from fine-tuned unimodal models. We investigate three different strategies to assign learning rates to different modalities. Our experiments show that MSLR outperforms global learning rates on multiple tasks and settings, and enables the models to effectively learn each modality.
Anthology ID:
2022.findings-acl.143
Volume:
Findings of the Association for Computational Linguistics: ACL 2022
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1824–1834
Language:
URL:
https://aclanthology.org/2022.findings-acl.143
DOI:
10.18653/v1/2022.findings-acl.143
Bibkey:
Cite (ACL):
Yiqun Yao and Rada Mihalcea. 2022. Modality-specific Learning Rates for Effective Multimodal Additive Late-fusion. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1824–1834, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Modality-specific Learning Rates for Effective Multimodal Additive Late-fusion (Yao & Mihalcea, Findings 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl-24-ws-corrections/2022.findings-acl.143.pdf
Data
MELD