Leveraging Alignment and Phonology for low-resource Indic to English Neural Machine Transliteration
Parth Patel, Manthan Mehta, Pushpak Bhattacharya, Arjun Atreya
Abstract
In this paper we present a novel transliteration technique based on Orthographic Syllable(OS) segmentation for low-resource Indian languages (ILs). Given that alignment has produced promising results in Statistical Machine Transliteration systems and phonology plays an important role in transliteration, we introduce a new model which uses alignment representation similar to that of IBM model 3 to pre-process the tokenized input sequence and then use pre-trained source and target OS-embeddings for training. We apply our model for transliteration from ILs to English and report our accuracy based on Top-1 Exact Match. We also compare our accuracy with a previously proposed Phrase-Based model and report improvements.- Anthology ID:
- 2020.icon-main.51
- Volume:
- Proceedings of the 17th International Conference on Natural Language Processing (ICON)
- Month:
- December
- Year:
- 2020
- Address:
- Indian Institute of Technology Patna, Patna, India
- Editors:
- Pushpak Bhattacharyya, Dipti Misra Sharma, Rajeev Sangal
- Venue:
- ICON
- SIG:
- Publisher:
- NLP Association of India (NLPAI)
- Note:
- Pages:
- 373–378
- Language:
- URL:
- https://aclanthology.org/2020.icon-main.51
- DOI:
- Cite (ACL):
- Parth Patel, Manthan Mehta, Pushpak Bhattacharya, and Arjun Atreya. 2020. Leveraging Alignment and Phonology for low-resource Indic to English Neural Machine Transliteration. In Proceedings of the 17th International Conference on Natural Language Processing (ICON), pages 373–378, Indian Institute of Technology Patna, Patna, India. NLP Association of India (NLPAI).
- Cite (Informal):
- Leveraging Alignment and Phonology for low-resource Indic to English Neural Machine Transliteration (Patel et al., ICON 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2020.icon-main.51.pdf