Abdullah Faiz Ur Rahman Khilji Darsh Kaushik


2021

pdf
EnKhCorp1.0: An English–Khasi Corpus
Sahinur Rahman Laskar | Abdullah Faiz Ur Rahman Khilji Darsh Kaushik | Partha Pakray | Sivaji Bandyopadhyay
Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021)

In machine translation, corpus preparation is one of the crucial tasks, particularly for lowresource pairs. In multilingual countries like India, machine translation plays a vital role in communication among people with various linguistic backgrounds. There are available online automatic translation systems by Google and Microsoft which include various languages which lack support for the Khasi language, which can hence be considered lowresource. This paper overviews the development of EnKhCorp1.0, a corpus for English–Khasi pair, and implemented baseline systems for EnglishtoKhasi and KhasitoEnglish translation based on the neural machine translation approach.