2022
pdf
abs
Rakuten’s Participation in WAT 2022: Parallel Dataset Filtering by Leveraging Vocabulary Heterogeneity
Alberto Poncelas
|
Johanes Effendi
|
Ohnmar Htun
|
Sunil Yadav
|
Dongzhe Wang
|
Saurabh Jain
Proceedings of the 9th Workshop on Asian Translation
This paper introduces our neural machine translation system’s participation in the WAT 2022 shared translation task (team ID: sakura). We participated in the Parallel Data Filtering Task. Our approach based on Feature Decay Algorithms achieved +1.4 and +2.4 BLEU points for English to Japanese and Japanese to English respectively compared to the model trained on the full dataset, showing the effectiveness of FDA on in-domain data selection.
pdf
abs
Controlling Japanese Machine Translation Output by Using JLPT Vocabulary Levels
Alberto Poncelas
|
Ohnmar Htun
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)
In Neural Machine Translation (NMT) systems, there is generally little control over the lexicon of the output. Consequently, the translated output may be too difficult for certain audiences. For example, for people with limited knowledge of the language, vocabulary is a major impediment to understanding a text. In this work, we build a complexity-controllable NMT for English-to-Japanese translations. More particularly, we aim to modulate the difficulty of the translation in terms of not only the vocabulary but also the use of kanji. For achieving this, we follow a sentence-tagging approach to influence the output.Controlling Japanese Machine Translation Output by Using JLPT Vocabulary Levels.
2021
pdf
abs
Rakuten’s Participation in WAT 2021: Examining the Effectiveness of Pre-trained Models for Multilingual and Multimodal Machine Translation
Raymond Hendy Susanto
|
Dongzhe Wang
|
Sunil Yadav
|
Mausam Jain
|
Ohnmar Htun
Proceedings of the 8th Workshop on Asian Translation (WAT2021)
This paper introduces our neural machine translation systems’ participation in the WAT 2021 shared translation tasks (team ID: sakura). We participated in the (i) NICT-SAP, (ii) Japanese-English multimodal translation, (iii) Multilingual Indic, and (iv) Myanmar-English translation tasks. Multilingual approaches such as mBART (Liu et al., 2020) are capable of pre-training a complete, multilingual sequence-to-sequence model through denoising objectives, making it a great starting point for building multilingual translation systems. Our main focus in this work is to investigate the effectiveness of multilingual finetuning on such a multilingual language model on various translation tasks, including low-resource, multimodal, and mixed-domain translation. We further explore a multimodal approach based on universal visual representation (Zhang et al., 2019) and compare its performance against a unimodal approach based on mBART alone.
2020
pdf
abs
Goku’s Participation in WAT 2020
Dongzhe Wang
|
Ohnmar Htun
Proceedings of the 7th Workshop on Asian Translation
This paper introduces our neural machine translation systems’ participation in the WAT 2020 (team ID: goku20). We participated in the (i) Patent, (ii) Business Scene Dialogue (BSD) document-level translation, (iii) Mixed-domain tasks. Regardless of simplicity, standard Transformer models have been proven to be very effective in many machine translation systems. Recently, some advanced pre-training generative models have been proposed on the basis of encoder-decoder framework. Our main focus of this work is to explore how robust Transformer models perform in translation from sentence-level to document-level, from resource-rich to low-resource languages. Additionally, we also investigated the improvement that fine-tuning on the top of pre-trained transformer-based models can achieve on various tasks.
2019
pdf
abs
Sarah’s Participation in WAT 2019
Raymond Hendy Susanto
|
Ohnmar Htun
|
Liling Tan
Proceedings of the 6th Workshop on Asian Translation
This paper describes our MT systems’ participation in the of WAT 2019. We participated in the (i) Patent, (ii) Timely Disclosure, (iii) Newswire and (iv) Mixed-domain tasks. Our main focus is to explore how similar Transformer models perform on various tasks. We observed that for tasks with smaller datasets, our best model setup are shallower models with lesser number of attention heads. We investigated practical issues in NMT that often appear in production settings, such as coping with multilinguality and simplifying pre- and post-processing pipeline in deployment.
2012
pdf
The NICT translation system for IWSLT 2012
Andrew Finch
|
Ohnmar Htun
|
Eiichiro Sumita
Proceedings of the 9th International Workshop on Spoken Language Translation: Evaluation Campaign