Automatic Syllabification for Manipuri language
Loitongbam Gyanendro Singh, Lenin Laitonjam, Sanasam Ranbir Singh
Abstract
Development of hand crafted rule for syllabifying words of a language is an expensive task. This paper proposes several data-driven methods for automatic syllabification of words written in Manipuri language. Manipuri is one of the scheduled Indian languages. First, we propose a language-independent rule-based approach formulated using entropy based phonotactic segmentation. Second, we project the syllabification problem as a sequence labeling problem and investigate its effect using various sequence labeling approaches. Third, we combine the effect of sequence labeling and rule-based method and investigate the performance of the hybrid approach. From various experimental observations, it is evident that the proposed methods outperform the baseline rule-based method. The entropy based phonotactic segmentation provides a word accuracy of 96%, CRF (sequence labeling approach) provides 97% and hybrid approach provides 98% word accuracy.- Anthology ID:
- C16-1034
- Volume:
- Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
- Month:
- December
- Year:
- 2016
- Address:
- Osaka, Japan
- Editors:
- Yuji Matsumoto, Rashmi Prasad
- Venue:
- COLING
- SIG:
- Publisher:
- The COLING 2016 Organizing Committee
- Note:
- Pages:
- 349–357
- Language:
- URL:
- https://aclanthology.org/C16-1034
- DOI:
- Cite (ACL):
- Loitongbam Gyanendro Singh, Lenin Laitonjam, and Sanasam Ranbir Singh. 2016. Automatic Syllabification for Manipuri language. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 349–357, Osaka, Japan. The COLING 2016 Organizing Committee.
- Cite (Informal):
- Automatic Syllabification for Manipuri language (Gyanendro Singh et al., COLING 2016)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/C16-1034.pdf