Linguistically Informed Hindi-English Neural Machine Translation

Vikrant Goyal; Pruthwik Mishra; Dipti Misra Sharma

Linguistically Informed Hindi-English Neural Machine Translation

Vikrant Goyal, Pruthwik Mishra, Dipti Misra Sharma

Abstract

Hindi-English Machine Translation is a challenging problem, owing to multiple factors including the morphological complexity and relatively free word order of Hindi, in addition to the lack of sufficient parallel training data. Neural Machine Translation (NMT) is a rapidly advancing MT paradigm and has shown promising results for many language pairs, especially in large training data scenarios. To overcome the data sparsity issue caused by the lack of large parallel corpora for Hindi-English, we propose a method to employ additional linguistic knowledge which is encoded by different phenomena depicted by Hindi. We generalize the embedding layer of the state-of-the-art Transformer model to incorporate linguistic features like POS tag, lemma and morph features to improve the translation performance. We compare the results obtained on incorporating this knowledge with the baseline systems and demonstrate significant performance improvements. Although, the Transformer NMT models have a strong efficacy to learn language constructs, we show that the usage of specific features further help in improving the translation performance.

Anthology ID:: 2020.lrec-1.456
Volume:: Proceedings of the 12th Language Resources and Evaluation Conference
Month:: May
Year:: 2020
Address:: Marseille, France
Venue:: LREC
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 3698–3703
Language:: English
URL:: https://aclanthology.org/2020.lrec-1.456
DOI:
Bibkey:
Cite (ACL):: Vikrant Goyal, Pruthwik Mishra, and Dipti Misra Sharma. 2020. Linguistically Informed Hindi-English Neural Machine Translation. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 3698–3703, Marseille, France. European Language Resources Association.
Cite (Informal):: Linguistically Informed Hindi-English Neural Machine Translation (Goyal et al., LREC 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/update-css-js/2020.lrec-1.456.pdf

PDF Cite Search