Abstract
End-to-end spoken language understanding (SLU) remains elusive even with current large pretrained language models on text and speech, especially in multilingual cases. Machine translation has been established as a powerful pretraining objective on text as it enables the model to capture high-level semantics of the input utterance and associations between different languages, which is desired for speech models that work on lower-level acoustic frames. Motivated particularly by the task of cross-lingual SLU, we demonstrate that the task of speech translation (ST) is a good means of pretraining speech models for end-to-end SLU on both intra- and cross-lingual scenarios. By introducing ST, our models reach higher performance over baselines on monolingual and multilingual intent classification as well as spoken question answering using SLURP, MINDS-14, and NMSQA benchmarks. To verify the effectiveness of our methods, we also create new benchmark datasets from both synthetic and real sources, for speech summarization and low-resource/zero-shot transfer from English to French or Spanish. We further show the value of preserving knowledge for the ST pretraining task for better downstream performance, possibly using Bayesian transfer regularizers.- Anthology ID:
- 2023.findings-emnlp.291
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2023
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4408–4423
- Language:
- URL:
- https://aclanthology.org/2023.findings-emnlp.291
- DOI:
- 10.18653/v1/2023.findings-emnlp.291
- Cite (ACL):
- Mutian He and Philip Garner. 2023. The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 4408–4423, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation (He & Garner, Findings 2023)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/2023.findings-emnlp.291.pdf