Abstract
Multimodal machine learning involves integrating and modeling information from multiple heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world applications in multimedia, affective computing, robotics, finance, HCI, and healthcare. This tutorial, building upon a new edition of a survey paper on multimodal ML as well as previously-given tutorials and academic courses, will describe an updated taxonomy on multimodal machine learning synthesizing its core technical challenges and major directions for future research.- Anthology ID:
- 2022.naacl-tutorials.5
- Volume:
- Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorial Abstracts
- Month:
- July
- Year:
- 2022
- Address:
- Seattle, United States
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 33–38
- Language:
- URL:
- https://aclanthology.org/2022.naacl-tutorials.5
- DOI:
- 10.18653/v1/2022.naacl-tutorials.5
- Cite (ACL):
- Louis-Philippe Morency, Paul Pu Liang, and Amir Zadeh. 2022. Tutorial on Multimodal Machine Learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorial Abstracts, pages 33–38, Seattle, United States. Association for Computational Linguistics.
- Cite (Informal):
- Tutorial on Multimodal Machine Learning (Morency et al., NAACL 2022)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/2022.naacl-tutorials.5.pdf
- Data
- Visual Question Answering