Abstract
Large-scale multi-modal classification aim to distinguish between different multi-modal data, and it has drawn dramatically attentions since last decade. In this paper, we propose a multi-task learning-based framework for the multimodal classification task, which consists of two branches: multi-modal autoencoder branch and attention-based multi-modal modeling branch. Multi-modal autoencoder can receive multi-modal features and obtain the interactive information which called multi-modal encoder feature, and use this feature to reconstitute all the input data. Besides, multi-modal encoder feature can be used to enrich the raw dataset, and improve the performance of downstream tasks (such as classification task). As for attention-based multimodal modeling branch, we first employ attention mechanism to make the model focused on important features, then we use the multi-modal encoder feature to enrich the input information, achieve a better performance. We conduct extensive experiments on different dataset, the results demonstrate the effectiveness of proposed framework.- Anthology ID:
- 2021.maiworkshop-1.5
- Volume:
- Proceedings of the Third Workshop on Multimodal Artificial Intelligence
- Month:
- June
- Year:
- 2021
- Address:
- Mexico City, Mexico
- Venue:
- maiworkshop
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 30–35
- Language:
- URL:
- https://aclanthology.org/2021.maiworkshop-1.5
- DOI:
- 10.18653/v1/2021.maiworkshop-1.5
- Cite (ACL):
- Danting Zeng. 2021. Multi Task Learning based Framework for Multimodal Classification. In Proceedings of the Third Workshop on Multimodal Artificial Intelligence, pages 30–35, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- Multi Task Learning based Framework for Multimodal Classification (Zeng, maiworkshop 2021)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2021.maiworkshop-1.5.pdf