Ryo Ishii


A Comparison of Praising Skills in Face-to-Face and Remote Dialogues
Toshiki Onishi | Asahi Ogushi | Yohei Tahara | Ryo Ishii | Atsushi Fukayama | Takao Nakamura | Akihiro Miyata
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Praising behavior is considered to an important method of communication in daily life and social activities. An engineering analysis of praising behavior is therefore valuable. However, a dialogue corpus for this analysis has not yet been developed. Therefore, we develop corpuses for face-to-face and remote two-party dialogues with ratings of praising skills. The corpuses enable us to clarify how to use verbal and nonverbal behaviors for successfully praise. In this paper, we analyze the differences between the face-to-face and remote corpuses, in particular the expressions in adjudged praising scenes in both corpuses, and also evaluated praising skills. We also compare differences in head motion, gaze behavior, facial expression in high-rated praising scenes in both corpuses. The results showed that the distribution of praising scores was similar in face-to-face and remote dialogues, although the ratio of the number of praising scenes to the number of utterances was different. In addition, we confirmed differences in praising behavior in face-to-face and remote dialogues.


Learning Language and Multimodal Privacy-Preserving Markers of Mood from Mobile Data
Paul Pu Liang | Terrance Liu | Anna Cai | Michal Muszynski | Ryo Ishii | Nick Allen | Randy Auerbach | David Brent | Ruslan Salakhutdinov | Louis-Philippe Morency
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Mental health conditions remain underdiagnosed even in countries with common access to advanced medical care. The ability to accurately and efficiently predict mood from easily collectible data has several important implications for the early detection, intervention, and treatment of mental health disorders. One promising data source to help monitor human behavior is daily smartphone usage. However, care must be taken to summarize behaviors without identifying the user through personal (e.g., personally identifiable information) or protected (e.g., race, gender) attributes. In this paper, we study behavioral markers of daily mood using a recent dataset of mobile behaviors from adolescent populations at high risk of suicidal behaviors. Using computational models, we find that language and multimodal representations of mobile typed text (spanning typed characters, words, keystroke timings, and app usage) are predictive of daily mood. However, we find that models trained to predict mood often also capture private user identities in their intermediate representations. To tackle this problem, we evaluate approaches that obfuscate user identity while remaining predictive. By combining multimodal representations with privacy-preserving learning, we are able to push forward the performance-privacy frontier.


No Gestures Left Behind: Learning Relationships between Spoken Language and Freeform Gestures
Chaitanya Ahuja | Dong Won Lee | Ryo Ishii | Louis-Philippe Morency
Findings of the Association for Computational Linguistics: EMNLP 2020

We study relationships between spoken language and co-speech gestures in context of two key challenges. First, distributions of text and gestures are inherently skewed making it important to model the long tail. Second, gesture predictions are made at a subword level, making it important to learn relationships between language and acoustic cues. We introduce AISLe, which combines adversarial learning with importance sampling to strike a balance between precision and coverage. We propose the use of a multimodal multiscale attention block to perform subword alignment without the need of explicit alignment between language and acoustic cues. Finally, to empirically study the importance of language in this task, we extend the dataset proposed in Ahuja et al. (2020) with automatically extracted transcripts for audio signals. We substantiate the effectiveness of our approach through large-scale quantitative and user studies, which show that our proposed methodology significantly outperforms previous state-of-the-art approaches for gesture generation. Link to code, data and videos: https://github.com/chahuja/aisle


Predicting Nods by using Dialogue Acts in Dialogue
Ryo Ishii | Ryuichiro Higashinaka | Junji Tomita
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Neural Dialogue Context Online End-of-Turn Detection
Ryo Masumura | Tomohiro Tanaka | Atsushi Ando | Ryo Ishii | Ryuichiro Higashinaka | Yushi Aono
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

This paper proposes a fully neural network based dialogue-context online end-of-turn detection method that can utilize long-range interactive information extracted from both speaker’s utterances and collocutor’s utterances. The proposed method combines multiple time-asynchronous long short-term memory recurrent neural networks, which can capture speaker’s and collocutor’s multiple sequential features, and their interactions. On the assumption of applying the proposed method to spoken dialogue systems, we introduce speaker’s acoustic sequential features and collocutor’s linguistic sequential features, each of which can be extracted in an online manner. Our evaluation confirms the effectiveness of taking dialogue context formed by the speaker’s utterances and collocutor’s utterances into consideration.