CM_CLIP: Unveiling Code-Mixed Multimodal Learning with Cross-Lingual CLIP Adaptations

Gitanjali Kumari; Arindam Chatterjee; Ashutosh Bajpai; Asif Ekbal; Vinutha B. NarayanaMurthy

CM_CLIP: Unveiling Code-Mixed Multimodal Learning with Cross-Lingual CLIP Adaptations

Gitanjali Kumari, Arindam Chatterjee, Ashutosh Bajpai, Asif Ekbal, Vinutha B. NarayanaMurthy

Abstract

In this paper, we present CMCLIP, a Code-Mixed Contrastive Linked Image Pre-trained model, an innovative extension of the widely recognized CLIP model. Our work adapts the CLIP framework to the code-mixed environment through a novel cross-lingual teacher training methodology. Building on the strengths of CLIP, we introduce the first code-mixed pre-trained text-and-vision model, CMCLIP, specifically designed for Hindi-English code-mixed multimodal language settings. The model is developed in two variants: CMCLIP-RB, based on ResNet, and CMCLIP-VX, based on ViT, both of which adapt the original CLIP model to suit code-mixed data. We also introduce a large, novel dataset called Parallel Hybrid Multimodal Code-mixed Hinglish (PHMCH), which forms the foundation for teacher training. The CMCLIP models are evaluated on various downstream tasks, including code-mixed Image-Text Retrieval (ITR) and classification tasks, such as humor and sarcasm detection, using a code-mixed meme dataset. Our experimental results demonstrate that CMCLIP outperforms existing models, such as M3P and multilingual-CLIP, establishing state-of-the-art performance for code-mixed multimodal tasks. We would also like to assert that although our data and frameworks are on Hindi-English code-mix, they can be extended to any other code-mixed language settings.

Anthology ID:: 2024.icon-1.36
Volume:: Proceedings of the 21st International Conference on Natural Language Processing (ICON)
Month:: December
Year:: 2024
Address:: AU-KBC Research Centre, Chennai, India
Editors:: Sobha Lalitha Devi, Karunesh Arora
Venue:: ICON
SIG:
Publisher:: NLP Association of India (NLPAI)
Note:
Pages:: 311–323
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.icon-1.36/
DOI:
Bibkey:
Cite (ACL):: Gitanjali Kumari, Arindam Chatterjee, Ashutosh Bajpai, Asif Ekbal, and Vinutha B. NarayanaMurthy. 2024. CM_CLIP: Unveiling Code-Mixed Multimodal Learning with Cross-Lingual CLIP Adaptations. In Proceedings of the 21st International Conference on Natural Language Processing (ICON), pages 311–323, AU-KBC Research Centre, Chennai, India. NLP Association of India (NLPAI).
Cite (Informal):: CM_CLIP: Unveiling Code-Mixed Multimodal Learning with Cross-Lingual CLIP Adaptations (Kumari et al., ICON 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.icon-1.36.pdf

PDF Cite Search Fix data