Beyond Multilinguality: Typological Limitations in Multilingual Models for Meitei Language

Badal Nyalang

Beyond Multilinguality: Typological Limitations in Multilingual Models for Meitei Language

Abstract

We present MeiteiRoBERTa, the first publicly available monolingual RoBERTa-based language model for Meitei (Manipuri), a low-resource language spoken by over 1.8 million people in Northeast India. Trained from scratch on 76 million words of Meitei text in Bengali script, our model achieves a perplexity of 65.89, representing a 5.2× improvement over multilingual baselines BERT (341.56) and MuRIL (355.65). Through comprehensive evaluation on perplexity, tokenization efficiency, and semantic representation quality, we demonstrate that domain-specific pre training significantly outperforms general-purpose multilingual models for low-resource languages. Our model exhibits superior semantic understanding with 0.769 similarity separation compared to 0.035 for mBERT and near-zero for MuRIL, despite MuRIL’s better tokenization efficiency (fertility: 3.29 vs. 4.65). We publicly release the model, training code, and datasets to accelerate NLP research for Meitei and other underrepresented Northeast Indian languages

Anthology ID:: 2026.sigtyp-main.5
Volume:: Proceedings of the 8th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Ekaterina Vylomova, Andrei Shcherbakov, Priya Rani
Venues:: SIGTYP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 32–38
Language:
URL:: https://preview.aclanthology.org/credits/2026.sigtyp-main.5/
DOI:
Bibkey:
Cite (ACL):: Badal Nyalang. 2026. Beyond Multilinguality: Typological Limitations in Multilingual Models for Meitei Language. In Proceedings of the 8th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pages 32–38, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Beyond Multilinguality: Typological Limitations in Multilingual Models for Meitei Language (Nyalang, SIGTYP 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/credits/2026.sigtyp-main.5.pdf

PDF Cite Search Fix data