InteractSpeech: A Speech Dialogue Interaction Corpus for Spoken Dialogue Model

Yifu Chen, Shengpeng Ji, Ziqing Wang, Hanting Wang, Zhou Zhao


Abstract
Spoken Dialogue Models (SDMs) have achieved significant progress in recent years, yet they continue to face challenges in handling nuanced interactional phenomena. A significant bottleneck hindering further advancement is the scarcity of publicly available, high-quality datasets meticulously designed to train and evaluate these fine-grained interactive capabilities. We introduce InteractSpeech, a 150-hour English speech interaction dialogue dataset designed to empower spoken dialogue models with nuanced real-time interaction capabilities, such as handling interruptions and backchannels. InteractSpeech was created by synthesizing interactive dialogues from text using advanced speech synthesis, and by filtering real-world spoken dialogues for interactive segments. The dataset features precise speaker timestamps and annotations for diverse dialogue interactions, underpinned by a formal framework for interaction dynamics. We demonstrate InteractSpeech’s utility by fine-tuning a LLaMA 3-8B model on its textual scenarios and, crucially, by training a speech understanding model that accurately classifies key interactional events directly from audio. This highlights the dataset’s value in developing models capable of more natural and responsive conversational turn-taking. Audio samples are available at https://interactspeech.github.io/.
Anthology ID:
2025.findings-emnlp.424
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8024–8033
Language:
URL:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.424/
DOI:
10.18653/v1/2025.findings-emnlp.424
Bibkey:
Cite (ACL):
Yifu Chen, Shengpeng Ji, Ziqing Wang, Hanting Wang, and Zhou Zhao. 2025. InteractSpeech: A Speech Dialogue Interaction Corpus for Spoken Dialogue Model. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 8024–8033, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
InteractSpeech: A Speech Dialogue Interaction Corpus for Spoken Dialogue Model (Chen et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.424.pdf
Checklist:
 2025.findings-emnlp.424.checklist.pdf