基于自监督表征蒸馏的Whisper低资源语音识别优化方法

Jian Hu, Ling Dong, Wenjun Wang, Yan Xiang, Shengxiang Gao, Zhengtao Yu


Abstract
"Whisper是一种强大的多语言语音识别模型,在英语等高资源语言上表现优异,但在缅甸语等部分低资源语言的性能仍受限于预训练数据的不足。为此,本文提出了一种基于自监督表征蒸馏的Whisper低资源语音识别优化方法。通过跨模型表征蒸馏机制,实现自监督模型表征向Whisper编码器的知识迁移,提升对缅甸语等语言的表征建模能力。实验结果表明,该方法在缅甸语、柬埔寨语、乌兹别克语和旁遮普语ASR任务中有效降低了字符错误率,验证了所提方法的有效性。"
Anthology ID:
2025.ccl-1.42
Volume:
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
Month:
August
Year:
2025
Address:
Jinan, China
Editors:
Maosong Sun, Peiyong Duan, Zhiyuan Liu, Ruifeng Xu, Weiwei Sun
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
563–573
Language:
URL:
https://preview.aclanthology.org/ingest-ccl/2025.ccl-1.42/
DOI:
Bibkey:
Cite (ACL):
Jian Hu, Ling Dong, Wenjun Wang, Yan Xiang, Shengxiang Gao, and Zhengtao Yu. 2025. 基于自监督表征蒸馏的Whisper低资源语音识别优化方法. In Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025), pages 563–573, Jinan, China. Chinese Information Processing Society of China.
Cite (Informal):
基于自监督表征蒸馏的Whisper低资源语音识别优化方法 (Hu et al., CCL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-ccl/2025.ccl-1.42.pdf