Mahendra Data
2021
ParaCotta: Synthetic Multilingual Paraphrase Corpora from the Most Diverse Translation Sample Pair
Alham Fikri Aji
|
Radityo Eko Prasojo Tirana Noor Fatyanosa
|
Philip Arthur
|
Suci Fitriany
|
Salma Qonitah
|
Nadhifa Zulfa
|
Tomi Santoso
|
Mahendra Data
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation
To Optimize, or Not to Optimize, That Is the Question: TelU-KU Models for WMT21 Large-Scale Multilingual Machine Translation
Sari Dewi Budiwati
|
Tirana Fatyanosa
|
Mahendra Data
|
Dedy Rahman Wijaya
|
Patrick Adolf Telnoni
|
Arie Ardiyanti Suryani
|
Agus Pratondo
|
Masayoshi Aritsugi
Proceedings of the Sixth Conference on Machine Translation
We describe TelU-KU models of large-scale multilingual machine translation for five Southeast Asian languages: Javanese, Indonesian, Malay, Tagalog, Tamil, and English. We explore a variation of hyperparameters of flores101_mm100_175M model using random search with 10% of datasets to improve BLEU scores of all thirty language pairs. We submitted two models, TelU-KU-175M and TelU-KU- 175M_HPO, with average BLEU scores of 12.46 and 13.19, respectively. Our models show improvement in most language pairs after optimizing the hyperparameters. We also identified three language pairs that obtained a BLEU score of more than 15 while using less than 70 sentences of the training dataset: Indonesian-Tagalog, Tagalog-Indonesian, and Malay-Tagalog.