Translations of the Callhome Egyptian Arabic corpus for conversational speech translation
Gaurav Kumar, Yuan Cao, Ryan Cotterell, Chris Callison-Burch, Daniel Povey, Sanjeev Khudanpur
Abstract
Translation of the output of automatic speech recognition (ASR) systems, also known as speech translation, has received a lot of research interest recently. This is especially true for programs such as DARPA BOLT which focus on improving spontaneous human-human conversation across languages. However, this research is hindered by the dearth of datasets developed for this explicit purpose. For Egyptian Arabic-English, in particular, no parallel speechtranscription-translation dataset exists in the same domain. In order to support research in speech translation, we introduce the Callhome Egyptian Arabic-English Speech Translation Corpus. This supplements the existing LDC corpus with four reference translations for each utterance in the transcripts. The result is a three-way parallel dataset of Egyptian Arabic Speech, transcriptions and English translations.- Anthology ID:
- 2014.iwslt-papers.13
- Volume:
- Proceedings of the 11th International Workshop on Spoken Language Translation: Papers
- Month:
- December 4-5
- Year:
- 2014
- Address:
- Lake Tahoe, California
- Editors:
- Marcello Federico, Sebastian Stüker, François Yvon
- Venue:
- IWSLT
- SIG:
- SIGSLT
- Publisher:
- Note:
- Pages:
- 244–248
- Language:
- URL:
- https://aclanthology.org/2014.iwslt-papers.13
- DOI:
- Cite (ACL):
- Gaurav Kumar, Yuan Cao, Ryan Cotterell, Chris Callison-Burch, Daniel Povey, and Sanjeev Khudanpur. 2014. Translations of the Callhome Egyptian Arabic corpus for conversational speech translation. In Proceedings of the 11th International Workshop on Spoken Language Translation: Papers, pages 244–248, Lake Tahoe, California.
- Cite (Informal):
- Translations of the Callhome Egyptian Arabic corpus for conversational speech translation (Kumar et al., IWSLT 2014)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2014.iwslt-papers.13.pdf