Extending the MuST-C Corpus for a Comparative Evaluation of Speech Translation Technology
Luisa Bentivogli, Mauro Cettolo, Marco Gaido, Alina Karakanta, Matteo Negri, Marco Turchi
Abstract
This project aimed at extending the test sets of the MuST-C speech translation (ST) corpus with new reference translations. The new references were collected from professional post-editors working on the output of different ST systems for three language pairs: English-German/Italian/Spanish. In this paper, we shortly describe how the data were collected and how they are distributed. As an evidence of their usefulness, we also summarise the findings of the first comparative evaluation of cascade and direct ST approaches, which was carried out relying on the collected data. The project was partially funded by the European Association for Machine Translation (EAMT) through its 2020 Sponsorship of Activities programme.- Anthology ID:
- 2022.eamt-1.70
- Volume:
- Proceedings of the 23rd Annual Conference of the European Association for Machine Translation
- Month:
- June
- Year:
- 2022
- Address:
- Ghent, Belgium
- Venue:
- EAMT
- SIG:
- Publisher:
- European Association for Machine Translation
- Note:
- Pages:
- 361–362
- Language:
- URL:
- https://aclanthology.org/2022.eamt-1.70
- DOI:
- Cite (ACL):
- Luisa Bentivogli, Mauro Cettolo, Marco Gaido, Alina Karakanta, Matteo Negri, and Marco Turchi. 2022. Extending the MuST-C Corpus for a Comparative Evaluation of Speech Translation Technology. In Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, pages 361–362, Ghent, Belgium. European Association for Machine Translation.
- Cite (Informal):
- Extending the MuST-C Corpus for a Comparative Evaluation of Speech Translation Technology (Bentivogli et al., EAMT 2022)
- PDF:
- https://preview.aclanthology.org/nodalida-main-page/2022.eamt-1.70.pdf