Towards Zero-shot Learning for End-to-end Cross-modal Translation Models

Jichen Yang, Kai Fan, Minpeng Liao, Boxing Chen, Zhongqiang Huang


Abstract
One of the main problems in speech translation is the mismatches between different modalities. The second problem, scarcity of parallel data covering multiple modalities, means that the end-to-end multi-modal models tend to perform worse than cascade models, although there are exceptions under favorable conditions. To address these problems, we propose an end-to-end zero-shot speech translation model, connecting two pre-trained uni-modality modules via word rotator’s distance. The model retains the ability of zero-shot, which is like cascade models, and also can be trained in an end-to-end style to avoid error propagation. Our comprehensive experiments on the MuST-C benchmarks show that our end-to-end zero-shot approach performs better than or as well as those of the CTC-based cascade models and that our end-to-end model with supervised training also matches the latest baselines.
Anthology ID:
2023.findings-emnlp.871
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13078–13087
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.871
DOI:
10.18653/v1/2023.findings-emnlp.871
Bibkey:
Cite (ACL):
Jichen Yang, Kai Fan, Minpeng Liao, Boxing Chen, and Zhongqiang Huang. 2023. Towards Zero-shot Learning for End-to-end Cross-modal Translation Models. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 13078–13087, Singapore. Association for Computational Linguistics.
Cite (Informal):
Towards Zero-shot Learning for End-to-end Cross-modal Translation Models (Yang et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/2023.findings-emnlp.871.pdf