Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment

Ethan A. Chi; Julian Salazar; Katrin Kirchhoff

doi:10.18653/v1/2021.naacl-main.154

Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment

Ethan A. Chi, Julian Salazar, Katrin Kirchhoff

Abstract

Non-autoregressive encoder-decoder models greatly improve decoding speed over autoregressive models, at the expense of generation quality. To mitigate this, iterative decoding models repeatedly infill or refine the proposal of a non-autoregressive model. However, editing at the level of output sequences limits model flexibility. We instead propose *iterative realignment*, which by refining latent alignments allows more flexible edits in fewer steps. Our model, Align-Refine, is an end-to-end Transformer which iteratively realigns connectionist temporal classification (CTC) alignments. On the WSJ dataset, Align-Refine matches an autoregressive baseline with a 14x decoding speedup; on LibriSpeech, we reach an LM-free test-other WER of 9.0% (19% relative improvement on comparable work) in three iterations. We release our code at https://github.com/amazon-research/align-refine.

Anthology ID:: 2021.naacl-main.154
Volume:: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:: June
Year:: 2021
Address:: Online
Editors:: Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1920–1927
Language:
URL:: https://preview.aclanthology.org/add-emnlp-2024-awards/2021.naacl-main.154/
DOI:: 10.18653/v1/2021.naacl-main.154
Bibkey:
Cite (ACL):: Ethan A. Chi, Julian Salazar, and Katrin Kirchhoff. 2021. Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1920–1927, Online. Association for Computational Linguistics.
Cite (Informal):: Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment (Chi et al., NAACL 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/add-emnlp-2024-awards/2021.naacl-main.154.pdf
Video:: https://preview.aclanthology.org/add-emnlp-2024-awards/2021.naacl-main.154.mp4
Data: LibriSpeech

PDF Cite Search Video Fix data