@inproceedings{he-garner-2023-interpreter,
    title = "The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation",
    author = "He, Mutian  and
      Garner, Philip",
    editor = "Bouamor, Houda  and
      Pino, Juan  and
      Bali, Kalika",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2023.findings-emnlp.291/",
    doi = "10.18653/v1/2023.findings-emnlp.291",
    pages = "4408--4423",
    abstract = "End-to-end spoken language understanding (SLU) remains elusive even with current large pretrained language models on text and speech, especially in multilingual cases. Machine translation has been established as a powerful pretraining objective on text as it enables the model to capture high-level semantics of the input utterance and associations between different languages, which is desired for speech models that work on lower-level acoustic frames. Motivated particularly by the task of cross-lingual SLU, we demonstrate that the task of speech translation (ST) is a good means of pretraining speech models for end-to-end SLU on both intra- and cross-lingual scenarios. By introducing ST, our models reach higher performance over baselines on monolingual and multilingual intent classification as well as spoken question answering using SLURP, MINDS-14, and NMSQA benchmarks. To verify the effectiveness of our methods, we also create new benchmark datasets from both synthetic and real sources, for speech summarization and low-resource/zero-shot transfer from English to French or Spanish. We further show the value of preserving knowledge for the ST pretraining task for better downstream performance, possibly using Bayesian transfer regularizers."
}Markdown (Informal)
[The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation](https://preview.aclanthology.org/ingest-emnlp/2023.findings-emnlp.291/) (He & Garner, Findings 2023)
ACL