Abstract
Code-mixing has become a moving method of communication among multilingual speakers. Most of the social media content of the multilingual societies are written in code-mixed text. However, most of the current translation systems neglect to convert code-mixed texts to a standard language. Most of the user written code-mixed content in social media remains unprocessed due to the unavailability of linguistic resource such as parallel corpus. This paper proposes a Neural Machine Translation(NMT) model to translate the Sinhala-English code-mixed text to the Sinhala language. Due to the limited resources available for Sinhala-English code-mixed(SECM) text, a parallel corpus is created with SECM sentences and Sinhala sentences. Srilankan social media sites contain SECM texts more frequently than the standard languages. The model proposed for code-mixed text translation in this study is a combination of Encoder-Decoder framework with LSTM units and Teachers Forcing Algorithm. The translated sentences from the model are evaluated using BLEU(Bilingual Evaluation Understudy) metric. Our model achieved a remarkable BLEU score for the translation.- Anthology ID:
- 2021.ranlp-1.82
- Volume:
- Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
- Month:
- September
- Year:
- 2021
- Address:
- Held Online
- Venue:
- RANLP
- SIG:
- Publisher:
- INCOMA Ltd.
- Note:
- Pages:
- 718–726
- Language:
- URL:
- https://aclanthology.org/2021.ranlp-1.82
- DOI:
- Cite (ACL):
- Archchana Kugathasan and Sagara Sumathipala. 2021. Neural Machine Translation for Sinhala-English Code-Mixed Text. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 718–726, Held Online. INCOMA Ltd..
- Cite (Informal):
- Neural Machine Translation for Sinhala-English Code-Mixed Text (Kugathasan & Sumathipala, RANLP 2021)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2021.ranlp-1.82.pdf