Abstract
Non-autoregressive encoder-decoder models greatly improve decoding speed over autoregressive models, at the expense of generation quality. To mitigate this, iterative decoding models repeatedly infill or refine the proposal of a non-autoregressive model. However, editing at the level of output sequences limits model flexibility. We instead propose *iterative realignment*, which by refining latent alignments allows more flexible edits in fewer steps. Our model, Align-Refine, is an end-to-end Transformer which iteratively realigns connectionist temporal classification (CTC) alignments. On the WSJ dataset, Align-Refine matches an autoregressive baseline with a 14x decoding speedup; on LibriSpeech, we reach an LM-free test-other WER of 9.0% (19% relative improvement on comparable work) in three iterations. We release our code at https://github.com/amazon-research/align-refine.- Anthology ID:
- 2021.naacl-main.154
- Volume:
- Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
- Month:
- June
- Year:
- 2021
- Address:
- Online
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1920–1927
- Language:
- URL:
- https://aclanthology.org/2021.naacl-main.154
- DOI:
- 10.18653/v1/2021.naacl-main.154
- Cite (ACL):
- Ethan A. Chi, Julian Salazar, and Katrin Kirchhoff. 2021. Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1920–1927, Online. Association for Computational Linguistics.
- Cite (Informal):
- Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment (Chi et al., NAACL 2021)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2021.naacl-main.154.pdf
- Data
- LibriSpeech