A study on the language independent stemmer in the Indian language IR

Siba Sankar Sahu, Sukomal Pal


Abstract
We explore and evaluate the effect of different language-independent stemmers in the information retrieval (IR) tasks with Indian languages such as Hindi, Gujarati, and English. The issue was examined from two points of view. Does a language-independent stemmer improve retrieval effectiveness in Indian languages IR? Which language-independent stemmer is the most suitable for different Indian languages? It is observed that stemming enhances retrieval efficiency in different Indian languages compared to the no stemming approaches. Among the different stemmers experimented with, the co-occurrence-based stemmer (SNS) performs the best and improves a mean average precision (MAP) score by 2.98% in Hindi, and 20.78% in Gujarati languages, respectively, whereas the graph-based stemmer (GRAS) performs the best and improves a MAP score by 5.83% in English.
Anthology ID:
2025.globalnlp-1.20
Volume:
Proceedings of the Workshop on Beyond English: Natural Language Processing for all Languages in an Era of Large Language Models
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Sudhansu Bala Das, Pruthwik Mishra, Alok Singh, Shamsuddeen Hassan Muhammad, Asif Ekbal, Uday Kumar Das
Venues:
GlobalNLP | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, BULGARIA
Note:
Pages:
181–189
Language:
URL:
https://preview.aclanthology.org/corrections-2026-01/2025.globalnlp-1.20/
DOI:
Bibkey:
Cite (ACL):
Siba Sankar Sahu and Sukomal Pal. 2025. A study on the language independent stemmer in the Indian language IR. In Proceedings of the Workshop on Beyond English: Natural Language Processing for all Languages in an Era of Large Language Models, pages 181–189, Varna, Bulgaria. INCOMA Ltd., Shoumen, BULGARIA.
Cite (Informal):
A study on the language independent stemmer in the Indian language IR (Sahu & Pal, GlobalNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2026-01/2025.globalnlp-1.20.pdf