Insights from Romanized Manipuri Social Media Text: A Transliteration Corpus and Variation Analysis
Maisang Kamei Salice, Sanasam Ranbir Singh, Priyankoo Sarmah
Abstract
This paper presents the first large-scale study of Romanized Manipuri, a low-resource Indic language widely used by native speakers on social media. Social media text is highly informal and often noisy, posing challenges for natural language processing tasks; therefore, normalization through back-transliteration is essential. We construct a Romanized Manipuri to Manipuri–Bengali script back-transliteration corpus from YouTube comments, capturing diverse informal writing styles and orthographic variations. The dataset is analyzed to examine variation patterns at two levels: character-level inconsistencies and pragmatic stylistic variations influenced by user writing behavior. We also compare social media romanization with formal transliteration conventions, including standardized romanization schemes and textbook-based systems. Furthermore, we evaluate Transformer model at both character and subword levels and conduct a detailed error analyses to identify key challenges affecting back-transliteration performance.- Anthology ID:
- 2026.lrec-main.147
- Volume:
- Proceedings of the Fifteenth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2026
- Address:
- Palma de Mallorca, Spain
- Editors:
- Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
- Venue:
- LREC
- SIG:
- Publisher:
- ELRA Language Resource Association
- Note:
- Pages:
- 1878–1888
- Language:
- URL:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.147/
- DOI:
- Cite (ACL):
- Maisang Kamei Salice, Sanasam Ranbir Singh, and Priyankoo Sarmah. 2026. Insights from Romanized Manipuri Social Media Text: A Transliteration Corpus and Variation Analysis. International Conference on Language Resources and Evaluation, main:1878–1888.
- Cite (Informal):
- Insights from Romanized Manipuri Social Media Text: A Transliteration Corpus and Variation Analysis (Salice et al., LREC 2026)
- PDF:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.147.pdf