Human–LLM Benchmarks for Bangla Dialect Translation: Sylheti and Chittagonian on the BanglaCHQ-Summ Corpus
Nowshin Mahjabin, Ahmed Shafin Ruhan, Mehreen Chowdhury, Md Fahim, MD Azam Hossain
Abstract
Millions in Bangladesh speak Sylheti and Chittagonian (Chatgaiyya) dialects, yet most public health guidance exists only in Standard Bangla, which creates barriers and safety risks. Ad-hoc translation further harms comprehension, while challenges such as scarce data, non-standard spelling, medical terms, numerals, and idioms make accurate translation difficult. We present BanglaCHQ-Prantik, the first benchmark for this setting, extending BanglaCHQ-Summ with human gold references from 17 native translators. We evaluate Qwen 2.5 3B, Gemma 3 1B, GPT-4o mini, and Gemini 2.5 Flash under zero-shot, one-shot, five-shot, and chain-of-thought prompts, using BLEU, ROUGE-1/2/L, and METEOR. Closed-source models (GPT-4o, Gemini 2.5) lead overall, with Gemini 2.5 Flash being strongest. Few-shot prompting helps especially for Sylheti, though errors persist with terminology, numerals, and idioms. The dataset is designed to support both NLP research and public health communication by enabling reliable translation across regional Bangla dialects. To our knowledge, this is the first medical-domain dataset for Sylheti/Chittagonian.- Anthology ID:
- 2025.banglalp-1.18
- Volume:
- Proceedings of the Second Workshop on Bangla Language Processing (BLP-2025)
- Month:
- December
- Year:
- 2025
- Address:
- Mumbai, India
- Editors:
- Firoj Alam, Sudipta Kar, Shammur Absar Chowdhury, Naeemul Hassan, Enamul Hoque Prince, Mohiuddin Tasnim, Md Rashad Al Hasan Rony, Md Tahmid Rahman Rahman
- Venues:
- BanglaLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 223–236
- Language:
- URL:
- https://preview.aclanthology.org/old-master/2025.banglalp-1.18/
- DOI:
- Cite (ACL):
- Nowshin Mahjabin, Ahmed Shafin Ruhan, Mehreen Chowdhury, Md Fahim, and MD Azam Hossain. 2025. Human–LLM Benchmarks for Bangla Dialect Translation: Sylheti and Chittagonian on the BanglaCHQ-Summ Corpus. In Proceedings of the Second Workshop on Bangla Language Processing (BLP-2025), pages 223–236, Mumbai, India. Association for Computational Linguistics.
- Cite (Informal):
- Human–LLM Benchmarks for Bangla Dialect Translation: Sylheti and Chittagonian on the BanglaCHQ-Summ Corpus (Mahjabin et al., BanglaLP 2025)
- PDF:
- https://preview.aclanthology.org/old-master/2025.banglalp-1.18.pdf