CM_CLIP: Unveiling Code-Mixed Multimodal Learning with Cross-Lingual CLIP Adaptations
Gitanjali Kumari, Arindam Chatterjee, Ashutosh Bajpai, Asif Ekbal, Vinutha B. NarayanaMurthy
Abstract
In this paper, we present CMCLIP, a Code-Mixed Contrastive Linked Image Pre-trained model, an innovative extension of the widely recognized CLIP model. Our work adapts the CLIP framework to the code-mixed environment through a novel cross-lingual teacher training methodology. Building on the strengths of CLIP, we introduce the first code-mixed pre-trained text-and-vision model, CMCLIP, specifically designed for Hindi-English code-mixed multimodal language settings. The model is developed in two variants: CMCLIP-RB, based on ResNet, and CMCLIP-VX, based on ViT, both of which adapt the original CLIP model to suit code-mixed data. We also introduce a large, novel dataset called Parallel Hybrid Multimodal Code-mixed Hinglish (PHMCH), which forms the foundation for teacher training. The CMCLIP models are evaluated on various downstream tasks, including code-mixed Image-Text Retrieval (ITR) and classification tasks, such as humor and sarcasm detection, using a code-mixed meme dataset. Our experimental results demonstrate that CMCLIP outperforms existing models, such as M3P and multilingual-CLIP, establishing state-of-the-art performance for code-mixed multimodal tasks. We would also like to assert that although our data and frameworks are on Hindi-English code-mix, they can be extended to any other code-mixed language settings.- Anthology ID:
- 2024.icon-1.36
- Volume:
- Proceedings of the 21st International Conference on Natural Language Processing (ICON)
- Month:
- December
- Year:
- 2024
- Address:
- AU-KBC Research Centre, Chennai, India
- Editors:
- Sobha Lalitha Devi, Karunesh Arora
- Venue:
- ICON
- SIG:
- Publisher:
- NLP Association of India (NLPAI)
- Note:
- Pages:
- 311–323
- Language:
- URL:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.icon-1.36/
- DOI:
- Cite (ACL):
- Gitanjali Kumari, Arindam Chatterjee, Ashutosh Bajpai, Asif Ekbal, and Vinutha B. NarayanaMurthy. 2024. CM_CLIP: Unveiling Code-Mixed Multimodal Learning with Cross-Lingual CLIP Adaptations. In Proceedings of the 21st International Conference on Natural Language Processing (ICON), pages 311–323, AU-KBC Research Centre, Chennai, India. NLP Association of India (NLPAI).
- Cite (Informal):
- CM_CLIP: Unveiling Code-Mixed Multimodal Learning with Cross-Lingual CLIP Adaptations (Kumari et al., ICON 2024)
- PDF:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.icon-1.36.pdf