Jie Hong
2025
Dynamic Model-Bank Test-Time Adaptation for Automatic Speech Recognition
Yanshuo Wang
|
Yanghao Zhou
|
Yukang Lin
|
Haoxing Chen
|
Jin Zhang
|
Wentao Zhu
|
Jie Hong
|
Xuesong Li
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
End-to-end automatic speech recognition (ASR) based on deep learning has achieved impressive progress in recent years. However, the performance of ASR foundation model often degrades significantly on out-of-domain data due to real-world domain shifts. Test-Time Adaptation (TTA) methods aim to mitigate this issue by adapting models during inference without access to source data. Despite recent progress, existing ASR TTA methods often struggle with instability under continual and long-term distribution shifts. To alleviate the risk of performance collapse due to error accumulation, we propose Dynamic Model-bank Single-Utterance Test-time Adaptation (DMSUTA), a sustainable continual TTA framework based on adaptive ASR model ensembling. DMSUTA maintains a dynamic model bank, from which a subset of checkpoints is selected for each test sample based on confidence and uncertainty criteria. To preserve both model plasticity and long-term stability, DMSUTA actively manages the bank by filtering out potentially collapsed models. This design allows DMSUTA to continually adapt to evolving domain shifts in ASR test-time scenarios. Experiments on diverse, continuously shifting ASR TTA benchmarks show that DMSUTA consistently outperforms existing continual TTA baselines, demonstrating superior robustness to domain shifts in ASR.
Search
Fix author
Co-authors
- Haoxing Chen 1
- Xuesong Li 1
- Yukang Lin 1
- Yanshuo Wang 1
- Jin Zhang (张瑾) 1
- show all...