Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech

Guan-Ting Lin; Wei Ping Huang; Hung-Yi Lee

doi:10.18653/v1/2024.emnlp-main.1116

Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech

Guan-Ting Lin, Wei Ping Huang, Hung-yi Lee

Abstract

Deep Learning-based end-to-end Automatic Speech Recognition (ASR) has made significant strides but still struggles with performance on out-of-domain samples due to domain shifts in real-world scenarios. Test-Time Adaptation (TTA) methods address this issue by adapting models using test samples at inference time. However, current ASR TTA methods have largely focused on non-continual TTA, which limits cross-sample knowledge learning compared to continual TTA. In this work, we first propose a Fast-slow TTA framework for ASR that leverages the advantage of continual and non-continual TTA. Following this framework, we introduce Dynamic SUTA (DSUTA), an entropy-minimization-based continual TTA method for ASR. To enhance DSUTA’s robustness for time-varying data, we design a dynamic reset strategy to automatically detect domain shifts and reset the model, making it more effective at handling multi-domain data. Our method demonstrates superior performance on various noisy ASR datasets, outperforming both non-continual and continual TTA baselines while maintaining robustness to domain changes without requiring domain boundary information.

Anthology ID:: 2024.emnlp-main.1116
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 20003–20015
Language:
URL:: https://aclanthology.org/2024.emnlp-main.1116
DOI:: 10.18653/v1/2024.emnlp-main.1116
Bibkey:
Cite (ACL):: Guan-Ting Lin, Wei Ping Huang, and Hung-yi Lee. 2024. Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 20003–20015, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech (Lin et al., EMNLP 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/dois-2013-emnlp/2024.emnlp-main.1116.pdf

PDF Search