PICT@DravidianLangTech-ACL2022: Neural Machine Translation On Dravidian Languages

Aditya Vyawahare, Rahul Tangsali, Aditya Mandke, Onkar Litake, Dipali Kadam


Abstract
This paper presents a summary of the findings that we obtained based on the shared task on machine translation of Dravidian languages. As a part of this shared task, we carried out neural machine translations for the following five language pairs: Kannada to Tamil, Kannada to Telugu, Kannada to Malayalam, Kannada to Sanskrit, and Kannada to Tulu. The datasets for each of the five language pairs were used to train various translation models, including Seq2Seq models such as LSTM, bidirectional LSTM, Conv Seq2Seq, and training state-of-the-art as transformers from scratch, and fine-tuning already pre-trained models. For some models involving monolingual corpora, we implemented backtranslation as well. These models’ accuracy was later tested with a part of the same dataset using BLEU score as an evaluation metric.
Anthology ID:
2022.dravidianlangtech-1.28
Volume:
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venue:
DravidianLangTech
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
177–183
Language:
URL:
https://aclanthology.org/2022.dravidianlangtech-1.28
DOI:
10.18653/v1/2022.dravidianlangtech-1.28
Bibkey:
Cite (ACL):
Aditya Vyawahare, Rahul Tangsali, Aditya Mandke, Onkar Litake, and Dipali Kadam. 2022. PICT@DravidianLangTech-ACL2022: Neural Machine Translation On Dravidian Languages. In Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, pages 177–183, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
PICT@DravidianLangTech-ACL2022: Neural Machine Translation On Dravidian Languages (Vyawahare et al., DravidianLangTech 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.dravidianlangtech-1.28.pdf
Video:
 https://preview.aclanthology.org/ingestion-script-update/2022.dravidianlangtech-1.28.mp4
Data
IndicCorpSamanantar