2020
pdf
abs
Linguistically Informed Hindi-English Neural Machine Translation
Vikrant Goyal
|
Pruthwik Mishra
|
Dipti Misra Sharma
Proceedings of the Twelfth Language Resources and Evaluation Conference
Hindi-English Machine Translation is a challenging problem, owing to multiple factors including the morphological complexity and relatively free word order of Hindi, in addition to the lack of sufficient parallel training data. Neural Machine Translation (NMT) is a rapidly advancing MT paradigm and has shown promising results for many language pairs, especially in large training data scenarios. To overcome the data sparsity issue caused by the lack of large parallel corpora for Hindi-English, we propose a method to employ additional linguistic knowledge which is encoded by different phenomena depicted by Hindi. We generalize the embedding layer of the state-of-the-art Transformer model to incorporate linguistic features like POS tag, lemma and morph features to improve the translation performance. We compare the results obtained on incorporating this knowledge with the baseline systems and demonstrate significant performance improvements. Although, the Transformer NMT models have a strong efficacy to learn language constructs, we show that the usage of specific features further help in improving the translation performance.
pdf
abs
Efficient Neural Machine Translation for Low-Resource Languages via Exploiting Related Languages
Vikrant Goyal
|
Sourav Kumar
|
Dipti Misra Sharma
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
A large percentage of the world’s population speaks a language of the Indian subcontinent, comprising languages from both Indo-Aryan (e.g. Hindi, Punjabi, Gujarati, etc.) and Dravidian (e.g. Tamil, Telugu, Malayalam, etc.) families. A universal characteristic of Indian languages is their complex morphology, which, when combined with the general lack of sufficient quantities of high-quality parallel data, can make developing machine translation (MT) systems for these languages difficult. Neural Machine Translation (NMT) is a rapidly advancing MT paradigm and has shown promising results for many language pairs, especially in large training data scenarios. Since the condition of large parallel corpora is not met for Indian-English language pairs, we present our efforts towards building efficient NMT systems between Indian languages (specifically Indo-Aryan languages) and English via efficiently exploiting parallel data from the related languages. We propose a technique called Unified Transliteration and Subword Segmentation to leverage language similarity while exploiting parallel data from related language pairs. We also propose a Multilingual Transfer Learning technique to leverage parallel data from multiple related languages to assist translation for low resource language pair of interest. Our experiments demonstrate an overall average improvement of 5 BLEU points over the standard Transformer-based NMT baselines.
pdf
abs
Contact Relatedness can help improve multilingual NMT: Microsoft STCI-MT @ WMT20
Vikrant Goyal
|
Anoop Kunchukuttan
|
Rahul Kejriwal
|
Siddharth Jain
|
Amit Bhagwat
Proceedings of the Fifth Conference on Machine Translation
We describe our submission for the English→Tamil and Tamil→English news translation shared task. In this submission, we focus on exploring if a low-resource language (Tamil) can benefit from a high-resource language (Hindi) with which it shares contact relatedness. We show utilizing contact relatedness via multilingual NMT can significantly improve translation quality for English-Tamil translation.
2019
pdf
abs
The IIIT-H Gujarati-English Machine Translation System for WMT19
Vikrant Goyal
|
Dipti Misra Sharma
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)
This paper describes the Neural Machine Translation system of IIIT-Hyderabad for the Gujarati→English news translation shared task of WMT19. Our system is basedon encoder-decoder framework with attention mechanism. We experimented with Multilingual Neural MT models. Our experiments show that Multilingual Neural Machine Translation leveraging parallel data from related language pairs helps in significant BLEU improvements upto 11.5, for low resource language pairs like Gujarati-English
pdf
abs
LTRC-MT Simple & Effective Hindi-English Neural Machine Translation Systems at WAT 2019
Vikrant Goyal
|
Dipti Misra Sharma
Proceedings of the 6th Workshop on Asian Translation
This paper describes the Neural Machine Translation systems of IIIT-Hyderabad (LTRC-MT) for WAT 2019 Hindi-English shared task. We experimented with both Recurrent Neural Networks & Transformer architectures. We also show the results of our experiments of training NMT models using additional data via backtranslation.