InteractSpeech: A Speech Dialogue Interaction Corpus for Spoken Dialogue Model
Yifu Chen, Shengpeng Ji, Ziqing Wang, Hanting Wang, Zhou Zhao
Abstract
Spoken Dialogue Models (SDMs) have achieved significant progress in recent years, yet they continue to face challenges in handling nuanced interactional phenomena. A significant bottleneck hindering further advancement is the scarcity of publicly available, high-quality datasets meticulously designed to train and evaluate these fine-grained interactive capabilities. We introduce InteractSpeech, a 150-hour English speech interaction dialogue dataset designed to empower spoken dialogue models with nuanced real-time interaction capabilities, such as handling interruptions and backchannels. InteractSpeech was created by synthesizing interactive dialogues from text using advanced speech synthesis, and by filtering real-world spoken dialogues for interactive segments. The dataset features precise speaker timestamps and annotations for diverse dialogue interactions, underpinned by a formal framework for interaction dynamics. We demonstrate InteractSpeech’s utility by fine-tuning a LLaMA 3-8B model on its textual scenarios and, crucially, by training a speech understanding model that accurately classifies key interactional events directly from audio. This highlights the dataset’s value in developing models capable of more natural and responsive conversational turn-taking. Audio samples are available at https://interactspeech.github.io/.- Anthology ID:
- 2025.findings-emnlp.424
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2025
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 8024–8033
- Language:
- URL:
- https://preview.aclanthology.org/ingest-luhme/2025.findings-emnlp.424/
- DOI:
- 10.18653/v1/2025.findings-emnlp.424
- Cite (ACL):
- Yifu Chen, Shengpeng Ji, Ziqing Wang, Hanting Wang, and Zhou Zhao. 2025. InteractSpeech: A Speech Dialogue Interaction Corpus for Spoken Dialogue Model. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 8024–8033, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- InteractSpeech: A Speech Dialogue Interaction Corpus for Spoken Dialogue Model (Chen et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/ingest-luhme/2025.findings-emnlp.424.pdf