Md Tasmim Rahman Adib
2025
Robustness of LLMs to Transliteration Perturbations in Bangla
Fabiha Haider
|
Md Farhan Ishmam
|
Fariha Tanjim Shifat
|
Md Tasmim Rahman Adib
|
Md Fahim
|
Md Farhad Alam Bhuiyan
Proceedings of the Second Workshop on Bangla Language Processing (BLP-2025)
Bangla text on the internet often appears in mixed scripts that combine native Bangla characters with their Romanized transliterations. To ensure practical usability, language models should be robust to naturally occurring script mixing. Our work investigates the robustness of current LLMs and Bangla language models under various transliteration-based textual perturbations, i.e., we augment portions of existing Bangla datasets using transliteration. Specifically, we replace words and sentences with their transliterated text to emulate realistic script mixing, and similarly, replace the top k salient words to emulate adversarial script mixing. Our experiments reveal interesting behavioral insights and vulnerabilities to robustness in language models for Bangla, which can be crucial for deploying such models in real-world scenarios and enhancing their overall robustness.