@inproceedings{sildam-etal-2024-finetuning,
    title = "Finetuning End-to-End Models for {E}stonian Conversational Spoken Language Translation",
    author = {Sildam, Tiia  and
      Velve, Andra  and
      Alum{\"a}e, Tanel},
    editor = "Ojha, Atul Kr.  and
      Liu, Chao-hong  and
      Vylomova, Ekaterina  and
      Pirinen, Flammie  and
      Abbott, Jade  and
      Washington, Jonathan  and
      Oco, Nathaniel  and
      Malykh, Valentin  and
      Logacheva, Varvara  and
      Zhao, Xiaobing",
    booktitle = "Proceedings of the Seventh Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2024)",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2024.loresmt-1.17/",
    doi = "10.18653/v1/2024.loresmt-1.17",
    pages = "166--174",
    abstract = "This paper investigates the finetuning of end-to-end models for bidirectional Estonian-English and Estonian-Russian conversational speech-to-text translation. Due to the limited availability of speech translation data for Estonian, we created additional training data by web scraping and synthesizing data from speech recognition datasets using machine translation. We evaluated three publicly available end-to-end models: Whisper, OWSM 3.1, and SeamlessM4T. Our results indicate that fine-tuning with synthetic data enhances translation accuracy by a large margin, with SeamlessM4T matching or surpassing cascaded speech translation systems that use state-of-the-art speech recognition and machine translation models."
}Markdown (Informal)
[Finetuning End-to-End Models for Estonian Conversational Spoken Language Translation](https://preview.aclanthology.org/ingest-emnlp/2024.loresmt-1.17/) (Sildam et al., LoResMT 2024)
ACL