EnKhCorp1.0: An English–Khasi Corpus
Sahinur Rahman Laskar, Abdullah Faiz Ur Rahman Khilji Darsh Kaushik, Partha Pakray, Sivaji Bandyopadhyay
Abstract
In machine translation, corpus preparation is one of the crucial tasks, particularly for lowresource pairs. In multilingual countries like India, machine translation plays a vital role in communication among people with various linguistic backgrounds. There are available online automatic translation systems by Google and Microsoft which include various languages which lack support for the Khasi language, which can hence be considered lowresource. This paper overviews the development of EnKhCorp1.0, a corpus for English–Khasi pair, and implemented baseline systems for EnglishtoKhasi and KhasitoEnglish translation based on the neural machine translation approach.- Anthology ID:
- 2021.mtsummit-loresmt.9
- Volume:
- Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021)
- Month:
- August
- Year:
- 2021
- Address:
- Virtual
- Editors:
- John Ortega, Atul Kr. Ojha, Katharina Kann, Chao-Hong Liu
- Venue:
- LoResMT
- SIG:
- Publisher:
- Association for Machine Translation in the Americas
- Note:
- Pages:
- 89–95
- Language:
- URL:
- https://aclanthology.org/2021.mtsummit-loresmt.9
- DOI:
- Cite (ACL):
- Sahinur Rahman Laskar, Abdullah Faiz Ur Rahman Khilji Darsh Kaushik, Partha Pakray, and Sivaji Bandyopadhyay. 2021. EnKhCorp1.0: An English–Khasi Corpus. In Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021), pages 89–95, Virtual. Association for Machine Translation in the Americas.
- Cite (Informal):
- EnKhCorp1.0: An English–Khasi Corpus (Laskar et al., LoResMT 2021)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2021.mtsummit-loresmt.9.pdf