Ryo Ishii


2021

pdf bib
Learning Language and Multimodal Privacy-Preserving Markers of Mood from Mobile Data
Paul Pu Liang | Terrance Liu | Anna Cai | Michal Muszynski | Ryo Ishii | Nick Allen | Randy Auerbach | David Brent | Ruslan Salakhutdinov | Louis-Philippe Morency
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Mental health conditions remain underdiagnosed even in countries with common access to advanced medical care. The ability to accurately and efficiently predict mood from easily collectible data has several important implications for the early detection, intervention, and treatment of mental health disorders. One promising data source to help monitor human behavior is daily smartphone usage. However, care must be taken to summarize behaviors without identifying the user through personal (e.g., personally identifiable information) or protected (e.g., race, gender) attributes. In this paper, we study behavioral markers of daily mood using a recent dataset of mobile behaviors from adolescent populations at high risk of suicidal behaviors. Using computational models, we find that language and multimodal representations of mobile typed text (spanning typed characters, words, keystroke timings, and app usage) are predictive of daily mood. However, we find that models trained to predict mood often also capture private user identities in their intermediate representations. To tackle this problem, we evaluate approaches that obfuscate user identity while remaining predictive. By combining multimodal representations with privacy-preserving learning, we are able to push forward the performance-privacy frontier.

2020

pdf bib
No Gestures Left Behind: Learning Relationships between Spoken Language and Freeform Gestures
Chaitanya Ahuja | Dong Won Lee | Ryo Ishii | Louis-Philippe Morency
Findings of the Association for Computational Linguistics: EMNLP 2020

We study relationships between spoken language and co-speech gestures in context of two key challenges. First, distributions of text and gestures are inherently skewed making it important to model the long tail. Second, gesture predictions are made at a subword level, making it important to learn relationships between language and acoustic cues. We introduce AISLe, which combines adversarial learning with importance sampling to strike a balance between precision and coverage. We propose the use of a multimodal multiscale attention block to perform subword alignment without the need of explicit alignment between language and acoustic cues. Finally, to empirically study the importance of language in this task, we extend the dataset proposed in Ahuja et al. (2020) with automatically extracted transcripts for audio signals. We substantiate the effectiveness of our approach through large-scale quantitative and user studies, which show that our proposed methodology significantly outperforms previous state-of-the-art approaches for gesture generation. Link to code, data and videos: https://github.com/chahuja/aisle

2018

pdf bib
Neural Dialogue Context Online End-of-Turn Detection
Ryo Masumura | Tomohiro Tanaka | Atsushi Ando | Ryo Ishii | Ryuichiro Higashinaka | Yushi Aono
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

This paper proposes a fully neural network based dialogue-context online end-of-turn detection method that can utilize long-range interactive information extracted from both speaker’s utterances and collocutor’s utterances. The proposed method combines multiple time-asynchronous long short-term memory recurrent neural networks, which can capture speaker’s and collocutor’s multiple sequential features, and their interactions. On the assumption of applying the proposed method to spoken dialogue systems, we introduce speaker’s acoustic sequential features and collocutor’s linguistic sequential features, each of which can be extracted in an online manner. Our evaluation confirms the effectiveness of taking dialogue context formed by the speaker’s utterances and collocutor’s utterances into consideration.

pdf bib
Predicting Nods by using Dialogue Acts in Dialogue
Ryo Ishii | Ryuichiro Higashinaka | Junji Tomita
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)