Abstract
Spoken language translation applications for speech suffer due to conversational speech phenomena, particularly the presence of disfluencies. With the rise of end-to-end speech translation models, processing steps such as disfluency removal that were previously an intermediate step between speech recognition and machine translation need to be incorporated into model architectures. We use a sequence-to-sequence model to translate from noisy, disfluent speech to fluent text with disfluencies removed using the recently collected ‘copy-edited’ references for the Fisher Spanish-English dataset. We are able to directly generate fluent translations and introduce considerations about how to evaluate success on this task. This work provides a baseline for a new task, implicitly removing disfluencies in end-to-end translation of conversational speech.- Anthology ID:
- N19-1285
- Volume:
- Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
- Month:
- June
- Year:
- 2019
- Address:
- Minneapolis, Minnesota
- Editors:
- Jill Burstein, Christy Doran, Thamar Solorio
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2786–2792
- Language:
- URL:
- https://aclanthology.org/N19-1285
- DOI:
- 10.18653/v1/N19-1285
- Cite (ACL):
- Elizabeth Salesky, Matthias Sperber, and Alexander Waibel. 2019. Fluent Translations from Disfluent Speech in End-to-End Speech Translation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2786–2792, Minneapolis, Minnesota. Association for Computational Linguistics.
- Cite (Informal):
- Fluent Translations from Disfluent Speech in End-to-End Speech Translation (Salesky et al., NAACL 2019)
- PDF:
- https://preview.aclanthology.org/naacl24-info/N19-1285.pdf