First Attempt at Building Parallel Corpora for Machine Translation of Northeast India’s Very Low-Resource Languages
Atnafu Lambebo Tonja, Melkamu Mersha, Ananya Kalita, Olga Kolesnikova, Jugal Kalita
Abstract
This paper presents the creation of initial bilingual corpora for thirteen very low-resource languages of India, all from Northeast India. It also presents the results of initial translation efforts in these languages. It creates the first-ever parallel corpora for these languages and provides initial benchmark neural machine translation results for these languages. We intend to extend these corpora to include a large number of low-resource Indian languages and integrate the effort with our prior work with African and American-Indian languages to create corpora covering a large number of languages from across the world.- Anthology ID:
- 2023.icon-1.49
- Volume:
- Proceedings of the 20th International Conference on Natural Language Processing (ICON)
- Month:
- December
- Year:
- 2023
- Address:
- Goa University, Goa, India
- Editors:
- Jyoti D. Pawar, Sobha Lalitha Devi
- Venue:
- ICON
- SIG:
- SIGLEX
- Publisher:
- NLP Association of India (NLPAI)
- Note:
- Pages:
- 534–539
- Language:
- URL:
- https://aclanthology.org/2023.icon-1.49
- DOI:
- Cite (ACL):
- Atnafu Lambebo Tonja, Melkamu Mersha, Ananya Kalita, Olga Kolesnikova, and Jugal Kalita. 2023. First Attempt at Building Parallel Corpora for Machine Translation of Northeast India’s Very Low-Resource Languages. In Proceedings of the 20th International Conference on Natural Language Processing (ICON), pages 534–539, Goa University, Goa, India. NLP Association of India (NLPAI).
- Cite (Informal):
- First Attempt at Building Parallel Corpora for Machine Translation of Northeast India’s Very Low-Resource Languages (Tonja et al., ICON 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2023.icon-1.49.pdf