Catherine Pelachaud

2025

pdf bib abs
“Mm, Wat?” Detecting Other-initiated Repair Requests in Dialogue
Anh Ha Ngo | Nicolas Rollet | Catherine Pelachaud | Chloé Clavel
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Maintaining mutual understanding is a key component in human-human conversation to avoid conversation breakdowns, in which repair, particularly Other-Initiated Repair (OIR, when one speaker signals trouble and prompts the other to resolve), plays a vital role. However, Conversational Agents (CAs) still fail to recognize user repair initiation, leading to breakdowns or disengagement. This work proposes a multimodal model to automatically detect repair initiation in Dutch dialogues by integrating linguistic and prosodic features grounded in Conversation Analysis. The results show that prosodic cues complement linguistic features and significantly improve the results of pretrained text and audio embeddings, offering insights into how different features interact. Future directions include incorporating visual cues, exploring multilingual and cross-context corpora to assess the robustness and generalizability.

pdf bib abs
Early Humorous Interaction: Towards a Formal Model
Yingqin Hu | Jonathan Ginzburg | Catherine Pelachaud
Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Current computational models for humour recognition and laughter generation in dialogue systems face significant limitations in explainability, context consideration and adaptability. This paper approaches these challenges by investigating how humour recognition develops in its earliest forms—during the first year of life. Drawing on developmental psychology and cognitive science, we propose a formal model incorporated within the KoS dialogue framework. This model captures how infants evaluate potential humour through knowledge-based appraisal and context-dependent modulation, including safety, emotional state, and social cues. Our model formalises dynamic knowledge updates during the dyadic interaction. We believe that this formal model can serve as the basis for developing more natural humour appreciation capabilities in dialogue systems and can be implemented in a robotic platform.

2024

pdf bib abs
Beyond Words: Decoding Facial Expression Dynamics in Motivational Interviewing
Nezih Younsi | Catherine Pelachaud | Laurence Chaby
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Authors : Nezih Younsi, Catherine Pelachaud, Laurence Chaby Title : Beyond Words: Decoding Facial Expression Dynamics in Motivational Interviewing Abstract : This paper focuses on studying the facial expressions of both client and therapist in the context of Motivational Interviewing (MI). The annotation system Motivational Interview Skill Code MISC defines three types of talk, namely sustain, change, and neutral for the client and information, question, or reflection for the therapist. Most studies on MI look at the verbal modality. Our research aims to understand the variation and dynamics of facial expressions of both interlocutors over a counseling session. We apply a sequence mining algorithm to identify categories of facial expressions for each type. Using co-occurrence analysis, we derive the correlation between the facial expressions and the different types of talk, as well as the interplay between interlocutors’ expressions.

pdf bib abs
Generating Unexpected yet Relevant User Dialog Acts
Lucie Galland | Catherine Pelachaud | Florian Pecune
Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue

The demand for mental health services has risen substantially in recent years, leading to challenges in meeting patient needs promptly. Virtual agents capable of emulating motivational interviews (MI) have emerged as a potential solution to address this issue, offering immediate support that is especially beneficial for therapy modalities requiring multiple sessions. However, developing effective patient simulation methods for training MI dialog systems poses challenges, particularly in generating syntactically and contextually correct, and diversified dialog acts while respecting existing patterns and trends in therapy data. This paper investigates data-driven approaches to simulate patients for training MI dialog systems. We propose a novel method that leverages time series models to generate diverse and contextually appropriate patient dialog acts, which are then transformed into utterances by a conditioned large language model. Additionally, we introduce evaluation measures tailored to assess the quality and coherence of simulated patient dialog. Our findings highlight the effectiveness of dialog act-conditioned approaches in improving patient simulation for MI, offering insights for developing virtual agents to support mental health therapy.

pdf bib abs
Exploration of Human Repair Initiation in Task-oriented Dialogue: A Linguistic Feature-based Approach
Anh Ngo | Dirk Heylen | Nicolas Rollet | Catherine Pelachaud | Chloé Clavel
Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue

In daily conversations, people often encounter problems prompting conversational repair to enhance mutual understanding. By employing an automatic coreference solver, alongside examining repetition, we identify various linguistic features that distinguish turns when the addressee initiates repair from those when they do not. Our findings reveal distinct patterns that characterize the repair sequence and each type of repair initiation.

2022

pdf bib abs
Annotating Interruption in Dyadic Human Interaction
Liu Yang | Catherine Achard | Catherine Pelachaud
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Integrating the existing interruption and turn switch classification methods, we propose a new annotation schema to annotate different types of interruptions through timeliness, switch accomplishment and speech content level. The proposed method is able to distinguish smooth turn exchange, backchannel and interruption (including interruption types) and to annotate dyadic conversation. We annotated the French part of NoXi corpus with the proposed structure and use these annotations to study the probability distribution and duration of each turn switch type.

pdf bib
Proceedings of the Workshop on Smiling and Laughter across Contexts and the Life-span within the 13th Language Resources and Evaluation Conference
Chiara Mazzocconi | Kevin El Haddad | Catherine Pelachaud | Gary McKeown
Proceedings of the Workshop on Smiling and Laughter across Contexts and the Life-span within the 13th Language Resources and Evaluation Conference

2020

pdf bib abs
Multimodal Analysis of Cohesion in Multi-party Interactions
Reshmashree Bangalore Kantharaju | Caroline Langlet | Mukesh Barange | Chloé Clavel | Catherine Pelachaud
Proceedings of the Twelfth Language Resources and Evaluation Conference

Group cohesion is an emergent phenomenon that describes the tendency of the group members’ shared commitment to group tasks and the interpersonal attraction among them. This paper presents a multimodal analysis of group cohesion using a corpus of multi-party interactions. We utilize 16 two-minute segments annotated with cohesion from the AMI corpus. We define three layers of modalities: non-verbal social cues, dialogue acts and interruptions. The initial analysis is performed at the individual level and later, we combine the different modalities to observe their impact on perceived level of cohesion. Results indicate that occurrence of laughter and interruption are higher in high cohesive segments. We also observe that, dialogue acts and head nods did not have an impact on the level of cohesion by itself. However, when combined there was an impact on the perceived level of cohesion. Overall, the analysis shows that multimodal cues are crucial for accurate analysis of group cohesion.

ISO standard 24617-2 for dialogue act annotation, established in 2012, has in the past few years been used both in corpus annotation and in the design of components for spoken and multimodal dialogue systems. This has brought some inaccuracies and undesirbale limitations of the standard to light, which are addressed in a proposed second edition. This second edition allows a more accurate annotation of dependence relations and rhetorical relations in dialogue. Following the ISO 24617-4 principles of semantic annotation, and borrowing ideas from EmotionML, a triple-layered plug-in mechanism is introduced which allows dialogue act descriptions to be enriched with information about their semantic content, about accompanying emotions, and other information, and allows the annotation scheme to be customised by adding application-specific dialogue act types.

Interpersonal attitudes are expressed by non-verbal behaviors on a variety of different modalities. The perception of these behaviors is influenced by how they are sequenced with other behaviors from the same person and behaviors from other interactants. In this paper, we present a method for extracting and generating sequences of non-verbal signals expressing interpersonal attitudes. These sequences are used as part of a framework for non-verbal expression with Embodied Conversational Agents that considers different features of non-verbal behavior: global behavior tendencies, interpersonal reactions, sequencing of non-verbal signals, and communicative intentions. Our method uses a sequence mining technique on an annotated multimodal corpus to extract sequences characteristic of different attitudes. New sequences of non-verbal signals are generated using a probabilistic model, and evaluated using the previously mined sequences.

pdf bib abs
Emilya: Emotional body expression in daily actions database
Nesrine Fourati | Catherine Pelachaud
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The studies of bodily expression of emotion have been so far mostly focused on body movement patterns associated with emotional expression. Recently, there is an increasing interest on the expression of emotion in daily actions, called also non-emblematic movements (such as walking or knocking at the door). Previous studies were based on database limited to a small range of movement tasks or emotional states. In this paper, we describe our new database of emotional body expression in daily actions, where 11 actors express 8 emotions in 7 actions. We use motion capture technology to record body movements, but we recorded as well synchronized audio-visual data to enlarge the use of the database for different research purposes. We investigate also the matching between the expressed emotions and the perceived ones through a perceptive study. The first results of this study are discussed in this paper.

pdf bib abs
A model to generate adaptive multimodal job interviews with a virtual recruiter
Zoraida Callejas | Brian Ravenet | Magalie Ochs | Catherine Pelachaud
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents an adaptive model of multimodal social behavior for embodied conversational agents. The context of this research is the training of youngsters for job interviews in a serious game where the agent plays the role of a virtual recruiter. With the proposed model the agent is able to adapt its social behavior according to the anxiety level of the trainee and a predefined difficulty level of the game. This information is used to select the objective of the system (to challenge or comfort the user), which is achieved by selecting the complexity of the next question posed and the agent’s verbal and non-verbal behavior. We have carried out a perceptive study that shows that the multimodal behavior of an agent implementing our model successfully conveys the expected social attitudes.

2010

This paper presents the large audiovisual laughter database recorded as part of the AVLaughterCycle project held during the eNTERFACE09 Workshop in Genova. 24 subjects participated. The freely available database includes audio signal and video recordings as well as facial motion tracking, thanks to markers placed on the subjects face. Annotations of the recordings, focusing on laughter description, are also provided and exhibited in this paper. In total, the corpus contains more than 1000 spontaneous laughs and 27 acted laughs. The laughter utterances are highly variable: the laughter duration ranges from 250ms to 82s and the sounds cover voiced vowels, breath-like expirations, hum-, hiccup- or grunt-like sounds, etc. However, as the subjects had no one to interact with, the database contains very few speech-laughs. Acted laughs tend to be longer than spontaneous ones and are more often composed of voiced vowels. The database can be useful for automatic laughter processing or cognitive science works. For the AVLaughterCycle project, it has served to animate a laughing virtual agent with an output laugh linked to the conversational partners input laugh.