Neural Machine Translation for Sinhala-English Code-Mixed Text

Archchana Kugathasan, Sagara Sumathipala


Abstract
Code-mixing has become a moving method of communication among multilingual speakers. Most of the social media content of the multilingual societies are written in code-mixed text. However, most of the current translation systems neglect to convert code-mixed texts to a standard language. Most of the user written code-mixed content in social media remains unprocessed due to the unavailability of linguistic resource such as parallel corpus. This paper proposes a Neural Machine Translation(NMT) model to translate the Sinhala-English code-mixed text to the Sinhala language. Due to the limited resources available for Sinhala-English code-mixed(SECM) text, a parallel corpus is created with SECM sentences and Sinhala sentences. Srilankan social media sites contain SECM texts more frequently than the standard languages. The model proposed for code-mixed text translation in this study is a combination of Encoder-Decoder framework with LSTM units and Teachers Forcing Algorithm. The translated sentences from the model are evaluated using BLEU(Bilingual Evaluation Understudy) metric. Our model achieved a remarkable BLEU score for the translation.
Anthology ID:
2021.ranlp-1.82
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Month:
September
Year:
2021
Address:
Held Online
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
718–726
Language:
URL:
https://aclanthology.org/2021.ranlp-1.82
DOI:
Bibkey:
Cite (ACL):
Archchana Kugathasan and Sagara Sumathipala. 2021. Neural Machine Translation for Sinhala-English Code-Mixed Text. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 718–726, Held Online. INCOMA Ltd..
Cite (Informal):
Neural Machine Translation for Sinhala-English Code-Mixed Text (Kugathasan & Sumathipala, RANLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/2021.ranlp-1.82.pdf