DUB: Discrete Unit Back-translation for Speech Translation

Dong Zhang, Rong Ye, Tom Ko, Mingxuan Wang, Yaqian Zhou


Abstract
How can speech-to-text translation (ST) perform as well as machine translation (MT)? The key point is to bridge the modality gap between speech and text so that useful MT techniques can be applied to ST.Recently, the approach of representing speech with unsupervised discrete units yields a new way to ease the modality problem. This motivates us to propose Discrete Unit Back-translation(DUB) to answer two questions (1) Is it better to represent speech with discrete units than with continuous features in direct ST? (2) How much benefit can useful MT techniques bring to ST? With DUB, the back-translation technique can successfully be applied on direct ST and obtains an average boost of 5.5 BLEU on MuST-C En-De/Fr/Es. In the low-resource language scenario, our method achieves comparable performance to existing methods that rely on large-scale external data. Code and models are available at https://anonymous.4open.science/r/DUB/.
Anthology ID:
2023.findings-acl.447
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7147–7164
Language:
URL:
https://aclanthology.org/2023.findings-acl.447
DOI:
Bibkey:
Cite (ACL):
Dong Zhang, Rong Ye, Tom Ko, Mingxuan Wang, and Yaqian Zhou. 2023. DUB: Discrete Unit Back-translation for Speech Translation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 7147–7164, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
DUB: Discrete Unit Back-translation for Speech Translation (Zhang et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/nodalida-main-page/2023.findings-acl.447.pdf