Automatic Syllabification for Manipuri language

Loitongbam Gyanendro Singh, Lenin Laitonjam, Sanasam Ranbir Singh


Abstract
Development of hand crafted rule for syllabifying words of a language is an expensive task. This paper proposes several data-driven methods for automatic syllabification of words written in Manipuri language. Manipuri is one of the scheduled Indian languages. First, we propose a language-independent rule-based approach formulated using entropy based phonotactic segmentation. Second, we project the syllabification problem as a sequence labeling problem and investigate its effect using various sequence labeling approaches. Third, we combine the effect of sequence labeling and rule-based method and investigate the performance of the hybrid approach. From various experimental observations, it is evident that the proposed methods outperform the baseline rule-based method. The entropy based phonotactic segmentation provides a word accuracy of 96%, CRF (sequence labeling approach) provides 97% and hybrid approach provides 98% word accuracy.
Anthology ID:
C16-1034
Volume:
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Yuji Matsumoto, Rashmi Prasad
Venue:
COLING
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
349–357
Language:
URL:
https://aclanthology.org/C16-1034
DOI:
Bibkey:
Cite (ACL):
Loitongbam Gyanendro Singh, Lenin Laitonjam, and Sanasam Ranbir Singh. 2016. Automatic Syllabification for Manipuri language. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 349–357, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Automatic Syllabification for Manipuri language (Gyanendro Singh et al., COLING 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/C16-1034.pdf