DialectMoE: An End-to-End Multi-Dialect Speech Recognition Model with Mixture-of-Experts

Jie Zhou, Shengxiang Gao, Zhengtao Yu, Ling Dong, Wenjun Wang


Abstract
“Dialect speech recognition has always been one of the challenges in Automatic Speech Recog-nition (ASR) systems. While lots of ASR systems perform well in Mandarin, their performancesignificantly drops when handling dialect speech. This is mainly due to the obvious differencesbetween dialects and Mandarin in pronunciation and the data scarcity of dialect speech. In thispaper, we propose DialectMoE, a Chinese multi-dialects speech recognition model based onMixture-of-Experts (MoE) in a low-resource conditions. Specifically, DialectMoE assigns inputsequences to a set of experts using a dynamic routing algorithm, with each expert potentiallytrained for a specific dialect. Subsequently, the outputs of these experts are combined to derivethe final output. Due to the similarities among dialects, distinct experts may offer assistance inrecognizing other dialects as well. Experimental results on the Datatang dialect public datasetshow that, compared with the baseline model, DialectMoE reduces Character Error Rate (CER)for Sichuan, Yunnan, Hubei and Henan dialects by 23.6%, 32.6%, 39.2% and 35.09% respec-tively. The proposed DialectMoE model demonstrates outstanding performance in multi-dialectsspeech recognition.”
Anthology ID:
2024.ccl-1.89
Volume:
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
Month:
July
Year:
2024
Address:
Taiyuan, China
Editors:
Sun Maosong, Liang Jiye, Han Xianpei, Liu Zhiyuan, He Yulan
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
1148–1159
Language:
English
URL:
https://preview.aclanthology.org/author-degibert/2024.ccl-1.89/
DOI:
Bibkey:
Cite (ACL):
Jie Zhou, Shengxiang Gao, Zhengtao Yu, Ling Dong, and Wenjun Wang. 2024. DialectMoE: An End-to-End Multi-Dialect Speech Recognition Model with Mixture-of-Experts. In Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference), pages 1148–1159, Taiyuan, China. Chinese Information Processing Society of China.
Cite (Informal):
DialectMoE: An End-to-End Multi-Dialect Speech Recognition Model with Mixture-of-Experts (Zhou et al., CCL 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-degibert/2024.ccl-1.89.pdf