Anirudh Chebolu
2025
Emotion-Aware Dysarthric Speech Reconstruction: LLMs and Multimodal Evaluation with MCDS
Kaushal Attaluri
|
Radhika Mamidi
|
Sireesha Chittepu
|
Anirudh Chebolu
|
Hitendra Sarma Thogarcheti
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Dysarthria, a motor speech disorder affecting over 46 million individuals globally, impairs both intelligibility and emotional expression in communication. This work introduces a novel framework for emotion-aware sentence reconstruction from dysarthric speech using Large Language Models (LLMs) fine-tuned with QLoRA, namely LLaMA 3.1 and Mistral 8x7B. Our pipeline integrates direct emotion recognition from raw audio and conditions textual reconstruction on this emotional context to enhance both semantic and affective fidelity.We propose the Multimodal Communication Dysarthria Score (MCDS), a holistic evaluation metric combining BLEU, semantic similarity, emotion consistency, and human ratings:MCDS=αB+βE+γS+δHwhere 𝛼 + 𝛽 + 𝛾 + 𝛿 = 1.On our extended TORGO+ dataset, our emotion-aware LLM model achieves a MCDS of 0.87 and BLEU of 72.4%, significantly outperforming traditional pipelines like Kaldi GMM-HMM (MCDS: 0.52, BLEU: 38.1%) and Whisper-based models. It also surpasses baseline LLM systems by 0.09 MCDS. This sets a new benchmark in emotionally intelligent dysarthric speech reconstruction, with future directions including multilingual support and real-time deployment.