Representation-Aware Prompting for Zero-Shot Marathi Text Classification: IPA, Romanization, Repetition

Van-Hien Tran; Huy Hien Vu; Hideki Tanaka; Masao Utiyama

Representation-Aware Prompting for Zero-Shot Marathi Text Classification: IPA, Romanization, Repetition

Van-Hien Tran, Huy Hien Vu, Hideki Tanaka, Masao Utiyama

Abstract

Large language models (LLMs) often underperform in zero-shot text classification for low-resource, non-Latin languages due to script and tokenization mismatches. We propose representation-aware prompting for Marathi that augments the original script with International Phonetic Alphabet (IPA) transcriptions, romanization, or a repetition-based fallback when external converters are unavailable. Experiments with two instruction-tuned LLMs on Marathi sentiment analysis and hate detection show consistent gains over script-only prompting (up to +2.6 accuracy points). We further find that the most effective augmentation is model-dependent, and that combining all variants is not consistently beneficial, suggesting that concise, targeted cues are preferable in zero-shot settings.

Anthology ID:: 2026.loreslm-1.37
Volume:: Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Hansi Hettiarachchi, Tharindu Ranasinghe, Alistair Plum, Paul Rayson, Ruslan Mitkov, Mohamed Gaber, Damith Premasiri, Fiona Anting Tan, Lasitha Uyangodage
Venue:: LoResLM
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 436–443
Language:
URL:: https://preview.aclanthology.org/manual-author-scripts/2026.loreslm-1.37/
DOI:
Bibkey:
Cite (ACL):: Van-Hien Tran, Huy Hien Vu, Hideki Tanaka, and Masao Utiyama. 2026. Representation-Aware Prompting for Zero-Shot Marathi Text Classification: IPA, Romanization, Repetition. In Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026), pages 436–443, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Representation-Aware Prompting for Zero-Shot Marathi Text Classification: IPA, Romanization, Repetition (Tran et al., LoResLM 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/manual-author-scripts/2026.loreslm-1.37.pdf

PDF Cite Search Fix data