Modular and Parameter-Efficient Multimodal Fusion with Prompting

Sheng Liang, Mengjie Zhao, Hinrich Schuetze


Abstract
Recent research has made impressive progress in large-scale multimodal pre-training. In the context of the rapid growth of model size, it is necessary to seek efficient and flexible methods other than finetuning. In this paper, we propose to use prompt vectors to align the modalities. Our method achieves comparable performance to several other multimodal fusion methods in low-resource settings. We further show that our method is modular and parameter-efficient for processing tasks involving two or more data modalities.
Anthology ID:
2022.findings-acl.234
Volume:
Findings of the Association for Computational Linguistics: ACL 2022
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2976–2985
Language:
URL:
https://aclanthology.org/2022.findings-acl.234
DOI:
10.18653/v1/2022.findings-acl.234
Bibkey:
Cite (ACL):
Sheng Liang, Mengjie Zhao, and Hinrich Schuetze. 2022. Modular and Parameter-Efficient Multimodal Fusion with Prompting. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2976–2985, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Modular and Parameter-Efficient Multimodal Fusion with Prompting (Liang et al., Findings 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl-24-ws-corrections/2022.findings-acl.234.pdf
Software:
 2022.findings-acl.234.software.zip