DialectMoE: An End-to-End Multi-Dialect Speech Recognition Model with Mixture-of-Experts
Jie Zhou, Shengxiang Gao, Zhengtao Yu, Ling Dong, Wenjun Wang
Abstract
“Dialect speech recognition has always been one of the challenges in Automatic Speech Recog-nition (ASR) systems. While lots of ASR systems perform well in Mandarin, their performancesignificantly drops when handling dialect speech. This is mainly due to the obvious differencesbetween dialects and Mandarin in pronunciation and the data scarcity of dialect speech. In thispaper, we propose DialectMoE, a Chinese multi-dialects speech recognition model based onMixture-of-Experts (MoE) in a low-resource conditions. Specifically, DialectMoE assigns inputsequences to a set of experts using a dynamic routing algorithm, with each expert potentiallytrained for a specific dialect. Subsequently, the outputs of these experts are combined to derivethe final output. Due to the similarities among dialects, distinct experts may offer assistance inrecognizing other dialects as well. Experimental results on the Datatang dialect public datasetshow that, compared with the baseline model, DialectMoE reduces Character Error Rate (CER)for Sichuan, Yunnan, Hubei and Henan dialects by 23.6%, 32.6%, 39.2% and 35.09% respec-tively. The proposed DialectMoE model demonstrates outstanding performance in multi-dialectsspeech recognition.”- Anthology ID:
- 2024.ccl-1.89
- Volume:
- Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
- Month:
- July
- Year:
- 2024
- Address:
- Taiyuan, China
- Editors:
- Sun Maosong, Liang Jiye, Han Xianpei, Liu Zhiyuan, He Yulan
- Venue:
- CCL
- SIG:
- Publisher:
- Chinese Information Processing Society of China
- Note:
- Pages:
- 1148–1159
- Language:
- English
- URL:
- https://preview.aclanthology.org/author-degibert/2024.ccl-1.89/
- DOI:
- Cite (ACL):
- Jie Zhou, Shengxiang Gao, Zhengtao Yu, Ling Dong, and Wenjun Wang. 2024. DialectMoE: An End-to-End Multi-Dialect Speech Recognition Model with Mixture-of-Experts. In Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference), pages 1148–1159, Taiyuan, China. Chinese Information Processing Society of China.
- Cite (Informal):
- DialectMoE: An End-to-End Multi-Dialect Speech Recognition Model with Mixture-of-Experts (Zhou et al., CCL 2024)
- PDF:
- https://preview.aclanthology.org/author-degibert/2024.ccl-1.89.pdf