BanSuite: A Unified Toolkit and Software Platform for Low-Resource NLP in Bangla

Md. Abu Sayed, Faisal Ahamed Khan, Jannatul Ferdous Tuli, Nabeel Mohammed, Mohammad Ruhul Amin, Mohammad Mamun Or Rashid


Abstract
Bangla is one of the world’s most widely spoken languages, yet it remains significantly under-resourced in natural language processing (NLP). Existing efforts have focused on isolated tasks such as Part-of-Speech (POS) tagging and Named Entity Recognition (NER), but comprehensive, integrated systems for core NLP tasks including Shallow Parsing and Dependency Parsing are largely absent. To address this gap, we present BanSuite, a unified Bangla NLP ecosystem developed under the EBLICT project. BanSuite combines a large-scale, manually annotated Bangla Treebank with high-quality pretrained models for POS tagging, NER, shallow parsing, and dependency parsing, achieving strong in-domain baseline performance (POS: 90.16 F1, NER: 90.11 F1, SP: 86.92 F1, DP: 90.27 UAS). The system is accessible through a Python toolkit (Bkit) and a Web Application, providing both researchers and non-technical users with robust NLP functionalities, including tokenization, normalization, lemmatization, and syntactic parsing. In benchmarking against existing Bangla NLP tools and multilingual Large Language Models (LLMs), BanSuite demonstrates superior task performance while maintaining high efficiency in resource usage. By offering the first comprehensive, open, and integrated NLP platform for Bangla, BanSuite lays a scalable foundation for research, application development, and further advancement of low-resource language technologies. A demonstration video is provided to illustrate the system’s functionality in https://youtu.be/3pcfiUQfCoA
Anthology ID:
2026.eacl-demo.44
Volume:
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Month:
March
Year:
2026
Address:
Rabat, Marocco
Editors:
Danilo Croce, Jochen Leidner, Nafise Sadat Moosavi
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
609–620
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-demo.44/
DOI:
Bibkey:
Cite (ACL):
Md. Abu Sayed, Faisal Ahamed Khan, Jannatul Ferdous Tuli, Nabeel Mohammed, Mohammad Ruhul Amin, and Mohammad Mamun Or Rashid. 2026. BanSuite: A Unified Toolkit and Software Platform for Low-Resource NLP in Bangla. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 609–620, Rabat, Marocco. Association for Computational Linguistics.
Cite (Informal):
BanSuite: A Unified Toolkit and Software Platform for Low-Resource NLP in Bangla (Sayed et al., EACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-demo.44.pdf