Chenda Li
2026
Beyond Sentence-level Labels: Integrating Conversational Context and Personal Experience for Natural Emotional Expression
Haiyang Sun | Chenyang Le | Wei Wang | Leying Zhang | Chuang Li | Bing Han | Chenda Li | Mengxiao Bi | Yanmin Qian
Findings of the Association for Computational Linguistics: ACL 2026
Haiyang Sun | Chenyang Le | Wei Wang | Leying Zhang | Chuang Li | Bing Han | Chenda Li | Mengxiao Bi | Yanmin Qian
Findings of the Association for Computational Linguistics: ACL 2026
Emotional Text-to-Speech aims to synthesize speech with human-like naturalness and expressiveness. However, existing systems rely on sentence-level labels, which fails to capture the subtle nuances of human affect. Based on cognitive appraisal theories, we argue that emotional expression is not generated in isolation but is deeply influenced by speaker’s Personal Experience and the conversational Context.To overcome the information bottleneck inherent in traditional annotations, we present Emotional-Context-Speech, a large-scale, context-aware speech corpus derived from multi-speaker audiobooks. This dataset provides not only transcriptions but also dialogue context, personal experience, open-vocabulary emotion labels, and paralinguistic descriptions.Experimental results demonstrate that TTS model trained using additional context and experience descriptions as inputs, called Emotional-Context-TTS, significantly outperforms existing methods in terms of emotional expression accuracy and naturalness.