CoMeT: Towards Code-Mixed Translation Using Parallel Monolingual Sentences
Devansh Gautam, Prashant Kodali, Kshitij Gupta, Anmol Goel, Manish Shrivastava, Ponnurangam Kumaraguru
Abstract
Code-mixed languages are very popular in multilingual societies around the world, yet the resources lag behind to enable robust systems on such languages. A major contributing factor is the informal nature of these languages which makes it difficult to collect code-mixed data. In this paper, we propose our system for Task 1 of CACLS 2021 to generate a machine translation system for English to Hinglish in a supervised setting. Translating in the given direction can help expand the set of resources for several tasks by translating valuable datasets from high resource languages. We propose to use mBART, a pre-trained multilingual sequence-to-sequence model, and fully utilize the pre-training of the model by transliterating the roman Hindi words in the code-mixed sentences to Devanagri script. We evaluate how expanding the input by concatenating Hindi translations of the English sentences improves mBART’s performance. Our system gives a BLEU score of 12.22 on test set. Further, we perform a detailed error analysis of our proposed systems and explore the limitations of the provided dataset and metrics.- Anthology ID:
- 2021.calcs-1.7
- Volume:
- Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching
- Month:
- June
- Year:
- 2021
- Address:
- Online
- Editors:
- Thamar Solorio, Shuguang Chen, Alan W. Black, Mona Diab, Sunayana Sitaram, Victor Soto, Emre Yilmaz, Anirudh Srinivasan
- Venue:
- CALCS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 47–55
- Language:
- URL:
- https://preview.aclanthology.org/Author-page-Marten-During-lu/2021.calcs-1.7/
- DOI:
- 10.18653/v1/2021.calcs-1.7
- Cite (ACL):
- Devansh Gautam, Prashant Kodali, Kshitij Gupta, Anmol Goel, Manish Shrivastava, and Ponnurangam Kumaraguru. 2021. CoMeT: Towards Code-Mixed Translation Using Parallel Monolingual Sentences. In Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching, pages 47–55, Online. Association for Computational Linguistics.
- Cite (Informal):
- CoMeT: Towards Code-Mixed Translation Using Parallel Monolingual Sentences (Gautam et al., CALCS 2021)
- PDF:
- https://preview.aclanthology.org/Author-page-Marten-During-lu/2021.calcs-1.7.pdf
- Code
- devanshg27/cm_translation