NICT’s Cascaded and End-To-End Speech Translation Systems using Whisper and IndicTrans2 for the Indic Task

Raj Dabre; Haiyue Song

doi:10.18653/v1/2024.iwslt-1.3

NICT’s Cascaded and End-To-End Speech Translation Systems using Whisper and IndicTrans2 for the Indic Task

Abstract

This paper presents the NICT’s submission for the IWSLT 2024 Indic track, focusing on three speech-to-text (ST) translation directions: English to Hindi, Bengali, and Tamil. We aim to enhance translation quality in this low-resource scenario by integrating state-of-the-art pre-trained automated speech recognition (ASR) and text-to-text machine translation (MT) models. Our cascade system incorporates a Whisper model fine-tuned for ASR and an IndicTrans2 model fine-tuned for MT. Additionally, we propose an end-to-end system that combines a Whisper model for speech-to-text conversion with knowledge distilled from an IndicTrans2 MT model. We first fine-tune the IndicTrans2 model to generate pseudo data in Indic languages. This pseudo data, along with the original English speech data, is then used to fine-tune the Whisper model. Experimental results show that the cascaded system achieved a BLEU score of 51.0, outperforming the end-to-end model, which scored 19.1 BLEU. Moreover, the analysis indicates that applying knowledge distillation from the IndicTrans2 model to the end-to-end ST model improves the translation quality by about 0.7 BLEU.

Anthology ID:: 2024.iwslt-1.3
Volume:: Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)
Month:: August
Year:: 2024
Address:: Bangkok, Thailand (in-person and online)
Editors:: Elizabeth Salesky, Marcello Federico, Marine Carpuat
Venue:: IWSLT
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 17–22
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2024.iwslt-1.3/
DOI:: 10.18653/v1/2024.iwslt-1.3
Bibkey:
Cite (ACL):: Raj Dabre and Haiyue Song. 2024. NICT’s Cascaded and End-To-End Speech Translation Systems using Whisper and IndicTrans2 for the Indic Task. In Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024), pages 17–22, Bangkok, Thailand (in-person and online). Association for Computational Linguistics.
Cite (Informal):: NICT’s Cascaded and End-To-End Speech Translation Systems using Whisper and IndicTrans2 for the Indic Task (Dabre & Song, IWSLT 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2024.iwslt-1.3.pdf

PDF Cite Search Fix data