2025
pdf
bib
abs
DELAB-IIITM WMT25: Enhancing Low-Resource Machine Translation for Manipuri and Assamese
Dingku Oinam
|
Navanath Saharia
Proceedings of the Tenth Conference on Machine Translation
This paper describe DELAB-IIITM’s submission system for the WMT25 machine translation shared task. We participated in two sub-task of the Indic Translation Task, en↔as and en↔mn i.e. Assamese (Indo Aryan language) and Manipuri (Tibeto Burman language) with a total of six translation directions, including mn→en, mn←en, en→as, en←as, mn→as, mn←as. Our fine tuning process aims to leverages the pretrained multilingual NLLB-200 model, a machine translation model developed by Meta AI as part of the No Language Left Behind (NLLB) project, through two main development, Synthetic parallel corpus creation and Strategic Fine-tuning. The Fine-tuning process involves strict data cleaning protocols, Adafactor optimizer with low learning rate(2e-5), 2 training epochs, train-test data splits to prevent overfitting, and Seq2SeqTrainer framework. The official test data was used to generate the target language with our fine-tuned model. Experimental results show that our method improves the BLEU scores for translation of these two language pairs. These findings confirm that back-translation remains challenging, largely due to morphological complexity and limited data availability.
2021
pdf
bib
abs
DELab@IIITSM at ICON-2021 Shared Task: Identification of Aggression and Biasness Using Decision Tree
Maibam Debina
|
Navanath Saharia
Proceedings of the 18th International Conference on Natural Language Processing: Shared Task on Multilingual Gender Biased and Communal Language Identification
This paper presents our system description on participation in ICON-2021 Shared Task sub-task 1 on multilingual gender-biased and communal language identification as team name: DELab@IIITSM. We have participated in two language-specific Meitei, Hindi, and one multi-lingualMeitei, Hindi, and Bangla with English code-mixed languages identification task. Our method includes well design pre-processing phase based on the dataset, the frequency-based feature extraction technique TF-IDF which creates the feature vector for each instance using(Decision Tree). We obtained weights are 0.629, 0.625, and 0.632 as the overall micro F1 score for the Hindi, Meitei, and multilingual datasets.
2012
pdf
bib
LuitPad: A fully Unicode compatible Assamese writing software
Navanath Saharia
|
Kishori M Konwar
Proceedings of the Second Workshop on Advances in Text Input Methods
2009
pdf
bib
Part of Speech Tagger for Assamese Text
Navanath Saharia
|
Dhrubajyoti Das
|
Utpal Sharma
|
Jugal Kalita
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers