Zhanpeng Chen


2024

pdf
MoE-SLU: Towards ASR-Robust Spoken Language Understanding via Mixture-of-Experts
Xuxin Cheng | Zhihong Zhu | Xianwei Zhuang | Zhanpeng Chen | Zhiqi Huang | Yuexian Zou
Findings of the Association for Computational Linguistics ACL 2024

As a crucial task in the task-oriented dialogue systems, spoken language understanding (SLU) has garnered increasing attention. However, errors from automatic speech recognition (ASR) often hinder the performance of understanding. To tackle this problem, we propose MoE-SLU, an ASR-Robust SLU framework based on the mixture-of-experts technique. Specifically, we first introduce three strategies to generate additional transcripts from clean transcripts. Then, we employ the mixture-of-experts technique to weigh the representations of the generated transcripts, ASR transcripts, and the corresponding clean manual transcripts. Additionally, we also regularize the weighted average of predictions and the predictions of ASR transcripts by minimizing the Jensen-Shannon Divergence (JSD) between these two output distributions. Experiment results on three benchmark SLU datasets demonstrate that our MoE-SLU achieves state-of-the-art performance. Further model analysis also verifies the superiority of our method.

pdf
Code-Switching Can be Better Aligners: Advancing Cross-Lingual SLU through Representation-Level and Prediction-Level Alignment
Zhihong Zhu | Xuxin Cheng | Zhanpeng Chen | Xianwei Zhuang | Zhiqi Huang | Yuexian Zou
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Zero-shot cross-lingual spoken language understanding (SLU) can promote the globalization application of dialog systems, which has attracted increasing attention. While current code-switching based cross-lingual SLU frameworks have shown promising results, they (i) predominantly utilize contrastive objectives to model hard alignment, which may disrupt the inherent structure within sentences of each language; and (ii) focus optimization objectives solely on the original sentences, neglecting the relation between original sentences and code-switched sentences, which may hinder contextualized embeddings from further alignment. In this paper, we propose a novel framework dubbed REPE (short for Representation-Level and Prediction-Level Alignment), which leverages both code-switched and original sentences to achieve multi-level alignment. Specifically, REPE introduces optimal transport to facilitate soft alignment between the representations of code-switched and original sentences, thereby preserving structural integrity as much as possible. Moreover, REPE adopts multi-view learning to enforce consistency regularization between the prediction of the two sentences, aligning them into a more refined language-invariant space. Based on this, we further incorporate a self-distillation layer to boost the robustness of REPE. Extensive experiments on two benchmarks across ten languages demonstrate the superiority of the proposed REPE framework.