Are Language Models Agnostic to Linguistically Grounded Perturbations? A Case Study of Indic Languages

Poulami Ghosh, Raj Dabre, Pushpak Bhattacharyya


Abstract
Pre-trained language models (PLMs) are known to be susceptible to perturbations to the input text, but existing works do not explicitly focus on linguistically grounded attacks, which are subtle and more prevalent in nature. In this paper, we study whether PLMs are agnostic to linguistically grounded attacks or not. To this end, we offer the first study addressing this, investigating different Indic languages and various downstream tasks. Our findings reveal that although PLMs are susceptible to linguistic perturbations, when compared to non-linguistic attacks, PLMs exhibit a slightly lower susceptibility to linguistic attacks. This highlights that even constrained attacks are effective. Moreover, we investigate the implications of these outcomes across a range of languages, encompassing diverse language families and different scripts.
Anthology ID:
2025.findings-naacl.468
Volume:
Findings of the Association for Computational Linguistics: NAACL 2025
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8362–8396
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.468/
DOI:
Bibkey:
Cite (ACL):
Poulami Ghosh, Raj Dabre, and Pushpak Bhattacharyya. 2025. Are Language Models Agnostic to Linguistically Grounded Perturbations? A Case Study of Indic Languages. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 8362–8396, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Are Language Models Agnostic to Linguistically Grounded Perturbations? A Case Study of Indic Languages (Ghosh et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.468.pdf