Siba Sankar Sahu
2025
A study on the language independent stemmer in the Indian language IR
Siba Sankar Sahu
|
Sukomal Pal
Proceedings of the Workshop on Beyond English: Natural Language Processing for all Languages in an Era of Large Language Models
We explore and evaluate the effect of different language-independent stemmers in the information retrieval (IR) tasks with Indian languages such as Hindi, Gujarati, and English. The issue was examined from two points of view. Does a language-independent stemmer improve retrieval effectiveness in Indian languages IR? Which language-independent stemmer is the most suitable for different Indian languages? It is observed that stemming enhances retrieval efficiency in different Indian languages compared to the no stemming approaches. Among the different stemmers experimented with, the co-occurrence-based stemmer (SNS) performs the best and improves a mean average precision (MAP) score by 2.98% in Hindi, and 20.78% in Gujarati languages, respectively, whereas the graph-based stemmer (GRAS) performs the best and improves a MAP score by 5.83% in English.