TigerLLM - A Family of Bangla Large Language Models

Nishat Raihan; Marcos Zampieri

TigerLLM - A Family of Bangla Large Language Models

Abstract

The development of Large Language Models (LLMs) remains heavily skewed towards English and a few other high-resource languages. This linguistic disparity is particularly evident for Bangla - the 5th most spoken language. A few initiatives attempted to create open-source Bangla LLMs with performance still behind high-resource languages and limited reproducibility. To address this gap, we introduce TigerLLM - a family of Bangla LLMs. Our results demonstrate that these models surpass all open-source alternatives and also outperform larger proprietary models like GPT3.5 across standard benchmarks, establishing TigerLLM as the new baseline for future Bangla language modeling.

Anthology ID:: 2025.acl-short.69
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 887–896
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.acl-short.69/
DOI:
Bibkey:
Cite (ACL):: Nishat Raihan and Marcos Zampieri. 2025. TigerLLM - A Family of Bangla Large Language Models. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 887–896, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: TigerLLM - A Family of Bangla Large Language Models (Raihan & Zampieri, ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.acl-short.69.pdf

PDF Cite Search Fix data