Vandan Mujadia


2021

pdf bib
Domain Adaptation for Hindi-Telugu Machine Translation Using Domain Specific Back Translation
Hema Ala | Vandan Mujadia | Dipti Sharma
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

In this paper, we present a novel approachfor domain adaptation in Neural MachineTranslation which aims to improve thetranslation quality over a new domain.Adapting new domains is a highly challeng-ing task for Neural Machine Translation onlimited data, it becomes even more diffi-cult for technical domains such as Chem-istry and Artificial Intelligence due to spe-cific terminology, etc. We propose DomainSpecific Back Translation method whichuses available monolingual data and gen-erates synthetic data in a different way.This approach uses Out Of Domain words.The approach is very generic and can beapplied to any language pair for any domain. We conduct our experiments onChemistry and Artificial Intelligence do-mains for Hindi and Telugu in both direc-tions. It has been observed that the usageof synthetic data created by the proposedalgorithm improves the BLEU scores significantly.

pdf bib
English-Marathi Neural Machine Translation for LoResMT 2021
Vandan Mujadia | Dipti Misra Sharma
Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021)

In this paper, we (team - oneNLP-IIITH) describe our Neural Machine Translation approaches for English-Marathi (both direction) for LoResMT-20211 . We experimented with transformer based Neural Machine Translation and explored the use of different linguistic features like POS and Morph on subword unit for both English-Marathi and Marathi-English. In addition, we have also explored forward and backward translation using web-crawled monolingual data. We obtained 22.2 (overall 2 nd) and 31.3 (overall 1 st) BLEU scores for English-Marathi and Marathi-English on respectively

pdf bib
Low Resource Similar Language Neural Machine Translation for Tamil-Telugu
Vandan Mujadia | Dipti Sharma
Proceedings of the Sixth Conference on Machine Translation

This paper describes the participation of team oneNLP (LTRC, IIIT-Hyderabad) for the WMT 2021 task, similar language translation. We experimented with transformer based Neural Machine Translation and explored the use of language similarity for Tamil-Telugu and Telugu-Tamil. We incorporated use of different subword configurations, script conversion and single model training for both directions as exploratory experiments.

2020

pdf bib
NMT based Similar Language Translation for Hindi - Marathi
Vandan Mujadia | Dipti Sharma
Proceedings of the Fifth Conference on Machine Translation

This paper describes the participation of team F1toF6 (LTRC, IIIT-Hyderabad) for the WMT 2020 task, similar language translation. We experimented with attention based recurrent neural network architecture (seq2seq) for this task. We explored the use of different linguistic features like POS and Morph along with back translation for Hindi-Marathi and Marathi-Hindi machine translation.

pdf bib
Proceedings of the 17th International Conference on Natural Language Processing (ICON): TechDOfication 2020 Shared Task
Dipti Misra Sharma | Asif Ekbal | Karunesh Arora | Sudip Kumar Naskar | Dipankar Ganguly | Sobha L | Radhika Mamidi | Sunita Arora | Pruthwik Mishra | Vandan Mujadia
Proceedings of the 17th International Conference on Natural Language Processing (ICON): TechDOfication 2020 Shared Task

pdf bib
Proceedings of the 17th International Conference on Natural Language Processing (ICON): TermTraction 2020 Shared Task
Dipti Misra Sharma | Asif Ekbal | Karunesh Arora | Sudip Kumar Naskar | Dipankar Ganguly | Sobha L | Radhika Mamidi | Sunita Arora | Pruthwik Mishra | Vandan Mujadia
Proceedings of the 17th International Conference on Natural Language Processing (ICON): TermTraction 2020 Shared Task

pdf bib
Proceedings of the 17th International Conference on Natural Language Processing (ICON): Adap-MT 2020 Shared Task
Dipti Misra Sharma | Asif Ekbal | Karunesh Arora | Sudip Kumar Naskar | Dipankar Ganguly | Sobha L | Radhika Mamidi | Sunita Arora | Pruthwik Mishra | Vandan Mujadia
Proceedings of the 17th International Conference on Natural Language Processing (ICON): Adap-MT 2020 Shared Task

2019

pdf bib
Arabic Dialect Identification for Travel and Twitter Text
Pruthwik Mishra | Vandan Mujadia
Proceedings of the Fourth Arabic Natural Language Processing Workshop

This paper presents the results of the experiments done as a part of MADAR Shared Task in WANLP 2019 on Arabic Fine-Grained Dialect Identification. Dialect Identification is one of the prominent tasks in the field of Natural language processing where the subsequent language modules can be improved based on it. We explored the use of different features like char, word n-gram, language model probabilities, etc on different classifiers. Results show that these features help to improve dialect classification accuracy. Results also show that traditional machine learning classifier tends to perform better when compared to neural network models on this task in a low resource setting.

pdf bib
A3-108 Machine Translation System for LoResMT 2019
Saumitra Yadav | Vandan Mujadia | Manish Shrivastava
Proceedings of the 2nd Workshop on Technologies for MT of Low Resource Languages

2017

pdf bib
POS Tagging For Resource Poor Languages Through Feature Projection
Pruthwik Mishra | Vandan Mujadia | Dipti Misra Sharma
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)

2016

pdf bib
Coreference Annotation Scheme and Relation Types for Hindi
Vandan Mujadia | Palash Gupta | Dipti Misra Sharma
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper describes a coreference annotation scheme, coreference annotation specific issues and their solutions through our proposed annotation scheme for Hindi. We introduce different co-reference relation types between continuous mentions of the same coreference chain such as “Part-of”, “Function-value pair” etc. We used Jaccard similarity based Krippendorff‘s’ alpha to demonstrate consistency in annotation scheme, annotation and corpora. To ease the coreference annotation process, we built a semi-automatic Coreference Annotation Tool (CAT). We also provide statistics of coreference annotation on Hindi Dependency Treebank (HDTB).

2013

pdf bib
A Hybrid Approach for Anaphora Resolution in Hindi
Praveen Dakwale | Vandan Mujadia | Dipti M Sharma
Proceedings of the Sixth International Joint Conference on Natural Language Processing