Jumi Sarmah


2019

pdf
Development of Assamese Rule based Stemmer using WordNet
Jumi Sarmah | Shikhar Kumar Sarma | Anup Kumar Barman
Proceedings of the 10th Global Wordnet Conference

Stemming is a technique that reduces any inflected word to its root form. Assamese is a morphologically rich, scheduled Indian language. There are various forms of suffixes applied to a word in various contexts. Such inflected words if normalized will help improve the performance of various Natural Language Processing applications. This paper basically tries to develop a Look-up and rule-based suffix stripping approach for the Assamese language using WordNet. The authors prepare the dictionary with the root words extracted from Assamese WordNet and Named Entities. Appropriate stemming rules for the inflected nouns, verbs have been set to the rule engine and later tested the stemmed output with the morphological root words of Assamese WordNet and Named Entities by computing hamming distance. This developed stemmer for the Assamese language achieves accuracy of 85%. Also, the authors reported the IR system’s performance on applying the Assamese stemmer and proved its efficiency by retrieving sense oriented results based on the fired query. Thus, Morphological Analyzer will embark the research wing for developing various Assamese NLP applications.

2014

pdf
A Quantitative Analysis of Synset of Assamese WordNet: Its Position and Timeline
Shikhar Sarma | Dibyajyoti Sarmah | Ratul Deka | Anup Barman | Jumi Sarmah | Himadri Bharali | Mayashree Mahanta | Umesh Deka
Proceedings of the Seventh Global Wordnet Conference

pdf
Assamese WordNet based Quality Enhancement of Bilingual Machine Translation System
Anup Barman | Jumi Sarmah | Shikhar Sarma
Proceedings of the Seventh Global Wordnet Conference