Towards Zero-shot Learning for End-to-end Cross-modal Translation Models
Jichen Yang, Kai Fan, Minpeng Liao, Boxing Chen, Zhongqiang Huang
Abstract
One of the main problems in speech translation is the mismatches between different modalities. The second problem, scarcity of parallel data covering multiple modalities, means that the end-to-end multi-modal models tend to perform worse than cascade models, although there are exceptions under favorable conditions. To address these problems, we propose an end-to-end zero-shot speech translation model, connecting two pre-trained uni-modality modules via word rotator’s distance. The model retains the ability of zero-shot, which is like cascade models, and also can be trained in an end-to-end style to avoid error propagation. Our comprehensive experiments on the MuST-C benchmarks show that our end-to-end zero-shot approach performs better than or as well as those of the CTC-based cascade models and that our end-to-end model with supervised training also matches the latest baselines.- Anthology ID:
- 2023.findings-emnlp.871
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2023
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 13078–13087
- Language:
- URL:
- https://aclanthology.org/2023.findings-emnlp.871
- DOI:
- 10.18653/v1/2023.findings-emnlp.871
- Cite (ACL):
- Jichen Yang, Kai Fan, Minpeng Liao, Boxing Chen, and Zhongqiang Huang. 2023. Towards Zero-shot Learning for End-to-end Cross-modal Translation Models. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 13078–13087, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Towards Zero-shot Learning for End-to-end Cross-modal Translation Models (Yang et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2023.findings-emnlp.871.pdf