Self-EmoQ: Plutchik-Guided Value-based Planning to Drive Streaming Emotional TTS

Yue Zhao, Hongyan Li, Yong Chen, Luo Ji


Abstract
Emotional interaction is increasingly crucial for conversational AI, yet current systems lack a self-emotion determination mechanism to drive the streaming text-to-speech (TTS) synthesis. We propose an emotion-planning framework that determines the emotion prior to the textual generation, grounding the downstream emotional TTS in a streaming manner. The framework is implemented by a plug-and-play LLM module, initialized from pretrained LLMs, and trained by reinforcement learning (RL) with emotions as the actions. A hybrid reward is employed which combines imitation signals with theory-driven scoring, in which the theory of Plutchik’s wheel of emotions is adopted. By experiments on DailyDialog, EmoryNLP, IMEOCAP, and MELD, our method outperforms prompting and finetuning baselines on both emotion determination and response quality. We finally implement an entire streaming pipeline for real-time deployment, with the speech quality confirming the framework’s emotional alignment, contextual coherence, and expressive fluency. Codes, cases, and demos are available in https://sixingdeguo.github.io/EmoQ-page/.
Anthology ID:
2026.findings-acl.740
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15038–15055
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.740/
DOI:
Bibkey:
Cite (ACL):
Yue Zhao, Hongyan Li, Yong Chen, and Luo Ji. 2026. Self-EmoQ: Plutchik-Guided Value-based Planning to Drive Streaming Emotional TTS. In Findings of the Association for Computational Linguistics: ACL 2026, pages 15038–15055, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Self-EmoQ: Plutchik-Guided Value-based Planning to Drive Streaming Emotional TTS (Zhao et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.740.pdf
Checklist:
 2026.findings-acl.740.checklist.pdf