Abstract
Recent research has made impressive progress in large-scale multimodal pre-training. In the context of the rapid growth of model size, it is necessary to seek efficient and flexible methods other than finetuning. In this paper, we propose to use prompt vectors to align the modalities. Our method achieves comparable performance to several other multimodal fusion methods in low-resource settings. We further show that our method is modular and parameter-efficient for processing tasks involving two or more data modalities.- Anthology ID:
- 2022.findings-acl.234
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2022
- Month:
- May
- Year:
- 2022
- Address:
- Dublin, Ireland
- Editors:
- Smaranda Muresan, Preslav Nakov, Aline Villavicencio
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2976–2985
- Language:
- URL:
- https://aclanthology.org/2022.findings-acl.234
- DOI:
- 10.18653/v1/2022.findings-acl.234
- Cite (ACL):
- Sheng Liang, Mengjie Zhao, and Hinrich Schuetze. 2022. Modular and Parameter-Efficient Multimodal Fusion with Prompting. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2976–2985, Dublin, Ireland. Association for Computational Linguistics.
- Cite (Informal):
- Modular and Parameter-Efficient Multimodal Fusion with Prompting (Liang et al., Findings 2022)
- PDF:
- https://preview.aclanthology.org/naacl-24-ws-corrections/2022.findings-acl.234.pdf