FCGCL: Fine- and Coarse-Granularity Contrastive Learning for Speech Translation

Hao Zhang, Nianwen Si, Yaqi Chen, Zhen Li, Tong Niu, Xukui Yang, Dan Qu


Abstract
It is notoriously difficult to implement end-to-end speech translation (E2E-ST) model because of the task complexity and data scarcity. Existing techniques often attempt to carry out implicit knowledge transfer from machine translation (MT) to ST model by imposing various constraints. However, in this transfer scenario, a significant problem is that the performance of the MT will drop significantly and the final transfer effect is also restricted. In this article, we recommend Fine and Coarse Granularity Contrastive Learning (FCGCL), which conduct explicit knowledge transfer from MT to ST model. Specially, we ensure through multi granularity contrastive learning that inputs with similar semantic between different modalities are encoded closely in the shared semantic space while inputs with different semantics are kept apart. Experiments on the MuST-C datasets on all 8 languages and further analysis show that our method can effectively improve the E2E-ST performance and achieves an average BLEU of 29.0.
Anthology ID:
2022.findings-emnlp.222
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2022
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3048–3059
Language:
URL:
https://aclanthology.org/2022.findings-emnlp.222
DOI:
10.18653/v1/2022.findings-emnlp.222
Bibkey:
Cite (ACL):
Hao Zhang, Nianwen Si, Yaqi Chen, Zhen Li, Tong Niu, Xukui Yang, and Dan Qu. 2022. FCGCL: Fine- and Coarse-Granularity Contrastive Learning for Speech Translation. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 3048–3059, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
FCGCL: Fine- and Coarse-Granularity Contrastive Learning for Speech Translation (Zhang et al., Findings 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl-24-ws-corrections/2022.findings-emnlp.222.pdf