Takao Obi
2025
Integrating Respiration into Voice Activity Projection for Enhancing Turn-taking Performance
Takao Obi
|
Kotaro Funakoshi
Proceedings of the 15th International Workshop on Spoken Dialogue Systems Technology
Voice Activity Projection (VAP) models predict upcoming voice activities on a continuous timescale, enabling more nuanced turn-taking behaviors in spoken dialogue systems. Although previous studies have shown robust performance with audio-based VAP, the potential of incorporating additional physiological information, such as respiration, remains relatively unexplored. In this paper, we investigate whether respiratory information can enhance VAP performance in turn-taking. To this end, we collected Japanese dialogue data with synchronized audio and respiratory waveforms, and then we integrated the respiratory information into the VAP model. Our results showed that the VAP model combining audio and respiratory information had better performance than the audio-only model. This finding underscores the potential for improving the turn-taking performance of VAP by incorporating respiration.
2024
Using Respiration for Enhancing Human-Robot Dialogue
Takao Obi
|
Kotaro Funakoshi
Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue
This paper presents the development and capabilities of a spoken dialogue robot that uses respiration to enhance human-robot dialogue. By employing a respiratory estimation technique that uses video input, the dialogue robot captures user respiratory information during dialogue. This information is then used to prevent speech collisions between the user and the robot and to present synchronized pseudo-respiration with the user, thereby enhancing the smoothness and engagement of human-robot dialogue.