Evaluating Multilingual Speech Translation under Realistic Conditions with Resegmentation and Terminology

Elizabeth Salesky, Kareem Darwish, Mohamed Al-Badrashiny, Mona Diab, Jan Niehues


Abstract
We present the ACL 60/60 evaluation sets for multilingual translation of ACL 2022 technical presentations into 10 target languages. This dataset enables further research into multilingual speech translation under realistic recording conditions with unsegmented audio and domain-specific terminology, applying NLP tools to text and speech in the technical domain, and evaluating and improving model robustness to diverse speaker demographics.
Anthology ID:
2023.iwslt-1.2
Volume:
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)
Month:
July
Year:
2023
Address:
Toronto, Canada (in-person and online)
Editors:
Elizabeth Salesky, Marcello Federico, Marine Carpuat
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
Association for Computational Linguistics
Note:
Pages:
62–78
Language:
URL:
https://aclanthology.org/2023.iwslt-1.2
DOI:
10.18653/v1/2023.iwslt-1.2
Bibkey:
Cite (ACL):
Elizabeth Salesky, Kareem Darwish, Mohamed Al-Badrashiny, Mona Diab, and Jan Niehues. 2023. Evaluating Multilingual Speech Translation under Realistic Conditions with Resegmentation and Terminology. In Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023), pages 62–78, Toronto, Canada (in-person and online). Association for Computational Linguistics.
Cite (Informal):
Evaluating Multilingual Speech Translation under Realistic Conditions with Resegmentation and Terminology (Salesky et al., IWSLT 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2023.iwslt-1.2.pdf
Dataset:
 2023.iwslt-1.2.dataset.zip