Cross-lingual and Multilingual CLIP
Fredrik Carlsson, Philipp Eisen, Faton Rekathati, Magnus Sahlgren
Abstract
The long-standing endeavor of relating the textual and the visual domain recently underwent a pivotal breakthrough, as OpenAI released CLIP. This model distinguishes how well an English text corresponds with a given image with unprecedented accuracy. Trained via a contrastive learning objective over a huge dataset of 400M of images and captions, it is a work that is not easily replicated, especially for low resource languages. Capitalizing on the modularization of the CLIP architecture, we propose to use cross-lingual teacher learning to re-train the textual encoder for various non-English languages. Our method requires no image data and relies entirely on machine translation which removes the need for data in the target language. We find that our method can efficiently train a new textual encoder with relatively low computational cost, whilst still outperforming previous baselines on multilingual image-text retrieval.- Anthology ID:
- 2022.lrec-1.739
- Volume:
- Proceedings of the Thirteenth Language Resources and Evaluation Conference
- Month:
- June
- Year:
- 2022
- Address:
- Marseille, France
- Editors:
- Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 6848–6854
- Language:
- URL:
- https://aclanthology.org/2022.lrec-1.739
- DOI:
- Cite (ACL):
- Fredrik Carlsson, Philipp Eisen, Faton Rekathati, and Magnus Sahlgren. 2022. Cross-lingual and Multilingual CLIP. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 6848–6854, Marseille, France. European Language Resources Association.
- Cite (Informal):
- Cross-lingual and Multilingual CLIP (Carlsson et al., LREC 2022)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2022.lrec-1.739.pdf
- Code
- FreddeFrallan/Multilingual-CLIP
- Data
- Flickr30k, XTD10