Mahendra Data


2021

pdf
ParaCotta: Synthetic Multilingual Paraphrase Corpora from the Most Diverse Translation Sample Pair
Alham Fikri Aji | Radityo Eko Prasojo Tirana Noor Fatyanosa | Philip Arthur | Suci Fitriany | Salma Qonitah | Nadhifa Zulfa | Tomi Santoso | Mahendra Data
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation

pdf
To Optimize, or Not to Optimize, That Is the Question: TelU-KU Models for WMT21 Large-Scale Multilingual Machine Translation
Sari Dewi Budiwati | Tirana Fatyanosa | Mahendra Data | Dedy Rahman Wijaya | Patrick Adolf Telnoni | Arie Ardiyanti Suryani | Agus Pratondo | Masayoshi Aritsugi
Proceedings of the Sixth Conference on Machine Translation

We describe TelU-KU models of large-scale multilingual machine translation for five Southeast Asian languages: Javanese, Indonesian, Malay, Tagalog, Tamil, and English. We explore a variation of hyperparameters of flores101_mm100_175M model using random search with 10% of datasets to improve BLEU scores of all thirty language pairs. We submitted two models, TelU-KU-175M and TelU-KU- 175M_HPO, with average BLEU scores of 12.46 and 13.19, respectively. Our models show improvement in most language pairs after optimizing the hyperparameters. We also identified three language pairs that obtained a BLEU score of more than 15 while using less than 70 sentences of the training dataset: Indonesian-Tagalog, Tagalog-Indonesian, and Malay-Tagalog.