IITP-MT at CALCS2021: English to Hinglish Neural Machine Translation using Unsupervised Synthetic Code-Mixed Parallel Corpus
Ramakrishna Appicharla, Kamal Kumar Gupta, Asif Ekbal, Pushpak Bhattacharyya
Abstract
This paper describes the system submitted by IITP-MT team to Computational Approaches to Linguistic Code-Switching (CALCS 2021) shared task on MT for English→Hinglish. We submit a neural machine translation (NMT) system which is trained on the synthetic code-mixed (cm) English-Hinglish parallel corpus. We propose an approach to create code-mixed parallel corpus from a clean parallel corpus in an unsupervised manner. It is an alignment based approach and we do not use any linguistic resources for explicitly marking any token for code-switching. We also train NMT model on the gold corpus provided by the workshop organizers augmented with the generated synthetic code-mixed parallel corpus. The model trained over the generated synthetic cm data achieves 10.09 BLEU points over the given test set.- Anthology ID:
- 2021.calcs-1.5
- Volume:
- Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching
- Month:
- June
- Year:
- 2021
- Address:
- Online
- Editors:
- Thamar Solorio, Shuguang Chen, Alan W. Black, Mona Diab, Sunayana Sitaram, Victor Soto, Emre Yilmaz, Anirudh Srinivasan
- Venue:
- CALCS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 31–35
- Language:
- URL:
- https://aclanthology.org/2021.calcs-1.5
- DOI:
- 10.18653/v1/2021.calcs-1.5
- Cite (ACL):
- Ramakrishna Appicharla, Kamal Kumar Gupta, Asif Ekbal, and Pushpak Bhattacharyya. 2021. IITP-MT at CALCS2021: English to Hinglish Neural Machine Translation using Unsupervised Synthetic Code-Mixed Parallel Corpus. In Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching, pages 31–35, Online. Association for Computational Linguistics.
- Cite (Informal):
- IITP-MT at CALCS2021: English to Hinglish Neural Machine Translation using Unsupervised Synthetic Code-Mixed Parallel Corpus (Appicharla et al., CALCS 2021)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2021.calcs-1.5.pdf