Sentence-Level Back-Transliteration of Romanized Indian Languages: Performance Analysis and Challenges
Saurabh Kumar, Dhruvkumar Babubhai Kakadiya, Sanasam Ranbir Singh, Sukumar Nandi
Abstract
The widespread use of Romanized text for Indian languages, particularly on social media platforms, poses significant challenges for natural language processing due to the lack of standardized orthography and the presence of contextual ambiguities. In this study, we explore sentence-level back-transliteration for 13 Indian languages, focusing on addressing the limitations of word-level models that fail to capture contextual dependencies. We evaluate state-of-the-art models, including fine-tuned LLaMA, mT5, and Multilingual Transformer models, comparing their performance against the baseline IndicXlit model. In addition, we conduct a comprehensive error analysis to gain deeper insights into model performance. Our results demonstrate that fine-tuned LLaMA and the proposed IndiXform model, specifically designed to leverage sentence-level context, significantly outperform zero-shot LLaMA and the IndicXlit baseline. These findings provide valuable insights into handling contextual ambiguities and enhancing the accuracy of back-transliteration systems for Indian languages.- Anthology ID:
- 2026.lrec-main.61
- Volume:
- Proceedings of the Fifteenth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2026
- Address:
- Palma de Mallorca, Spain
- Editors:
- Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
- Venue:
- LREC
- SIG:
- Publisher:
- ELRA Language Resource Association
- Note:
- Pages:
- 818–827
- Language:
- URL:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.61/
- DOI:
- Cite (ACL):
- Saurabh Kumar, Dhruvkumar Babubhai Kakadiya, Sanasam Ranbir Singh, and Sukumar Nandi. 2026. Sentence-Level Back-Transliteration of Romanized Indian Languages: Performance Analysis and Challenges. International Conference on Language Resources and Evaluation, main:818–827.
- Cite (Informal):
- Sentence-Level Back-Transliteration of Romanized Indian Languages: Performance Analysis and Challenges (Kumar et al., LREC 2026)
- PDF:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.61.pdf