Protocol and lessons learnt from the production of parallel corpora for the evaluation of speech translation systems

Victoria Arranz, Olivier Hamon, Karim Boudahmane, Martine Garnier-Rizet


Abstract
Machine translation evaluation campaigns require the production of reference corpora to automatically measure system output. This paper describes recent efforts to create such data with the objective of measuring the quality of the systems participating in the Quaero evaluations. In particular, we focus on the protocols behind such production as well as all the issues raised by the complexity of the transcription data handled.
Anthology ID:
2011.iwslt-evaluation.17
Volume:
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign
Month:
December 8-9
Year:
2011
Address:
San Francisco, California
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
Note:
Pages:
129–135
Language:
URL:
https://aclanthology.org/2011.iwslt-evaluation.17
DOI:
Bibkey:
Cite (ACL):
Victoria Arranz, Olivier Hamon, Karim Boudahmane, and Martine Garnier-Rizet. 2011. Protocol and lessons learnt from the production of parallel corpora for the evaluation of speech translation systems. In Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign, pages 129–135, San Francisco, California.
Cite (Informal):
Protocol and lessons learnt from the production of parallel corpora for the evaluation of speech translation systems (Arranz et al., IWSLT 2011)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2011.iwslt-evaluation.17.pdf