Modality Adaption or Regularization? A Case Study on End-to-End Speech Translation

Yuchen Han; Chen Xu (许晨, 徐晨); Tong Xiao (肖桐); Jingbo Zhu (朱靖波)

doi:10.18653/v1/2023.acl-short.115

Modality Adaption or Regularization? A Case Study on End-to-End Speech Translation

Yuchen Han, Chen Xu, Tong Xiao, Jingbo Zhu

Abstract

Pre-training and fine-tuning is a paradigm for alleviating the data scarcity problem in end-to-end speech translation (E2E ST). The commonplace ”modality gap” between speech and text data often leads to inconsistent inputs between pre-training and fine-tuning. However, we observe that this gap occurs in the early stages of fine-tuning, but does not have a major impact on the final performance. On the other hand, we find that there has another gap, which we call the ”capacity gap”: high resource tasks (such as ASR and MT) always require a large model to fit, when the model is reused for a low resource task (E2E ST), it will get a sub-optimal performance due to the over-fitting. In a case study, we find that the regularization plays a more important role than the well-designed modality adaption method, which achieves 29.0 for en-de and 40.3 for en-fr on the MuST-C dataset.

Anthology ID:: 2023.acl-short.115
Volume:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1340–1348
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2023.acl-short.115/
DOI:: 10.18653/v1/2023.acl-short.115
Bibkey:
Cite (ACL):: Yuchen Han, Chen Xu, Tong Xiao, and Jingbo Zhu. 2023. Modality Adaption or Regularization? A Case Study on End-to-End Speech Translation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1340–1348, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Modality Adaption or Regularization? A Case Study on End-to-End Speech Translation (Han et al., ACL 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2023.acl-short.115.pdf
Video:: https://preview.aclanthology.org/ingest-emnlp/2023.acl-short.115.mp4

PDF Cite Search Video Fix data