Insights from Romanized Manipuri Social Media Text: A Transliteration Corpus and Variation Analysis

Maisang Kamei Salice, Sanasam Ranbir Singh, Priyankoo Sarmah


Abstract
This paper presents the first large-scale study of Romanized Manipuri, a low-resource Indic language widely used by native speakers on social media. Social media text is highly informal and often noisy, posing challenges for natural language processing tasks; therefore, normalization through back-transliteration is essential. We construct a Romanized Manipuri to Manipuri–Bengali script back-transliteration corpus from YouTube comments, capturing diverse informal writing styles and orthographic variations. The dataset is analyzed to examine variation patterns at two levels: character-level inconsistencies and pragmatic stylistic variations influenced by user writing behavior. We also compare social media romanization with formal transliteration conventions, including standardized romanization schemes and textbook-based systems. Furthermore, we evaluate Transformer model at both character and subword levels and conduct a detailed error analyses to identify key challenges affecting back-transliteration performance.
Anthology ID:
2026.lrec-main.147
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
1878–1888
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.147/
DOI:
Bibkey:
Cite (ACL):
Maisang Kamei Salice, Sanasam Ranbir Singh, and Priyankoo Sarmah. 2026. Insights from Romanized Manipuri Social Media Text: A Transliteration Corpus and Variation Analysis. International Conference on Language Resources and Evaluation, main:1878–1888.
Cite (Informal):
Insights from Romanized Manipuri Social Media Text: A Transliteration Corpus and Variation Analysis (Salice et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.147.pdf