NTTSU at WMT2024 General Translation Task
Minato Kondo, Ryo Fukuda, Xiaotian Wang, Katsuki Chousa, Masato Nishimura, Kosei Buma, Takatomo Kano, Takehito Utsuro
Abstract
The NTTSU team’s submission leverages several large language models developed through a training procedure that includes continual pre-training and supervised fine-tuning. For paragraph-level translation, we generated synthetic paragraph-aligned data and utilized this data for training.In the task of translating Japanese to Chinese, we particularly focused on the speech domain translation. Specifically, we built Whisper models for Japanese automatic speech recognition (ASR). We used YODAS dataset for Whisper training. Since this data contained many noisy data pairs, we combined the Whisper outputs using ROVER for polishing the transcriptions. Furthermore, to enhance the robustness of the translation model against errors in the transcriptions, we performed data augmentation by forward translation from audio, using both ASR and base translation models.To select the best translation from multiple hypotheses of the models, we applied Minimum Bayes Risk decoding + reranking, incorporating scores such as COMET-QE, COMET, and cosine similarity by LaBSE.- Anthology ID:
- 2024.wmt-1.20
- Volume:
- Proceedings of the Ninth Conference on Machine Translation
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
- Venue:
- WMT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 270–279
- Language:
- URL:
- https://preview.aclanthology.org/add-emnlp-2024-awards/2024.wmt-1.20/
- DOI:
- 10.18653/v1/2024.wmt-1.20
- Cite (ACL):
- Minato Kondo, Ryo Fukuda, Xiaotian Wang, Katsuki Chousa, Masato Nishimura, Kosei Buma, Takatomo Kano, and Takehito Utsuro. 2024. NTTSU at WMT2024 General Translation Task. In Proceedings of the Ninth Conference on Machine Translation, pages 270–279, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- NTTSU at WMT2024 General Translation Task (Kondo et al., WMT 2024)
- PDF:
- https://preview.aclanthology.org/add-emnlp-2024-awards/2024.wmt-1.20.pdf