Beyond Additive Fusion: Learning Non-Additive Multimodal Interactions
Torsten Wörtwein, Lisa Sheeber, Nicholas Allen, Jeffrey Cohn, Louis-Philippe Morency
Abstract
Multimodal fusion addresses the problem of analyzing spoken words in the multimodal context, including visual expressions and prosodic cues. Even when multimodal models lead to performance improvements, it is often unclear whether bimodal and trimodal interactions are learned or whether modalities are processed independently of each other. We propose Multimodal Residual Optimization (MRO) to separate unimodal, bimodal, and trimodal interactions in a multimodal model. This improves interpretability as the multimodal interaction can be quantified. Inspired by Occam’s razor, the main intuition of MRO is that (simpler) unimodal contributions should be learned before learning (more complex) bimodal and trimodal interactions. For example, bimodal predictions should learn to correct the mistakes (residuals) of unimodal predictions, thereby letting the bimodal predictions focus on the remaining bimodal interactions. Empirically, we observe that MRO successfully separates unimodal, bimodal, and trimodal interactions while not degrading predictive performance. We complement our empirical results with a human perception study and observe that MRO learns multimodal interactions that align with human judgments.- Anthology ID:
- 2022.findings-emnlp.344
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2022
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates
- Editors:
- Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4681–4696
- Language:
- URL:
- https://aclanthology.org/2022.findings-emnlp.344
- DOI:
- 10.18653/v1/2022.findings-emnlp.344
- Cite (ACL):
- Torsten Wörtwein, Lisa Sheeber, Nicholas Allen, Jeffrey Cohn, and Louis-Philippe Morency. 2022. Beyond Additive Fusion: Learning Non-Additive Multimodal Interactions. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 4681–4696, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Cite (Informal):
- Beyond Additive Fusion: Learning Non-Additive Multimodal Interactions (Wörtwein et al., Findings 2022)
- PDF:
- https://preview.aclanthology.org/revert-3132-ingestion-checklist/2022.findings-emnlp.344.pdf