BanglaIPA: Towards Robust Text-to-IPA Transcription with Contextual Rewriting in Bengali
Jakir Hasan, Shrestha Datta, Md Saiful Islam, Shubhashis Roy Dipta, Ameya Debnath
Abstract
Despite its widespread use, Bengali lacks a robust automated International Phonetic Alphabet (IPA) transcription system that effectively supports both standard language and regional dialectal texts. Existing approaches struggle to handle regional variations, numerical expressions, and generalize poorly to previously unseen words. To address these limitations, we propose BanglaIPA, a novel IPA generation system that integrates a character-based vocabulary with word-level alignment. The proposed system accurately handles Bengali numerals and demonstrates strong performance across regional dialects. BanglaIPA improves inference efficiency by leveraging a precomputed word-to-IPA mapping dictionary for previously observed words. The system is evaluated on the standard Bengali and six regional variations of the DUAL-IPA dataset. Experimental results show that BanglaIPA outperforms baseline IPA transcription models by 58.4-78.7% and achieves an overall mean word error rate of 11.4%, highlighting its robustness in phonetic transcription generation for the Bengali language.- Anthology ID:
- 2026.loreslm-1.12
- Volume:
- Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)
- Month:
- March
- Year:
- 2026
- Address:
- Rabat, Morocco
- Editors:
- Hansi Hettiarachchi, Tharindu Ranasinghe, Alistair Plum, Paul Rayson, Ruslan Mitkov, Mohamed Gaber, Damith Premasiri, Fiona Anting Tan, Lasitha Uyangodage
- Venue:
- LoResLM
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 132–139
- Language:
- URL:
- https://preview.aclanthology.org/manual-author-scripts/2026.loreslm-1.12/
- DOI:
- Cite (ACL):
- Jakir Hasan, Shrestha Datta, Md Saiful Islam, Shubhashis Roy Dipta, and Ameya Debnath. 2026. BanglaIPA: Towards Robust Text-to-IPA Transcription with Contextual Rewriting in Bengali. In Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026), pages 132–139, Rabat, Morocco. Association for Computational Linguistics.
- Cite (Informal):
- BanglaIPA: Towards Robust Text-to-IPA Transcription with Contextual Rewriting in Bengali (Hasan et al., LoResLM 2026)
- PDF:
- https://preview.aclanthology.org/manual-author-scripts/2026.loreslm-1.12.pdf