This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
LaurenceDevillers
Also published as:
L. Devillers
Fixing paper assignments
Please select all papers that do not belong to this person.
Indicate below which author they should be assigned to.
This paper describes a data collection methodology and emotion annotation of dyadic interactions between a human, a Pepper robot, a Google Home smart-speaker, or another human. The collected 16 hours of audio recordings were used to analyze the propensity to change someone’s opinions about ecological behavior regarding the type of conversational agent, the kind of nudges, and the speaker’s emotional state. We describe the statistics of data collection and annotation. We also report the first results, which showed that humans change their opinions on more questions with a human than with a device, even against mainstream ideas. We observe a correlation between a certain emotional state and the interlocutor and a human’s propensity to be influenced. We also reported the results of the studies that investigated the effect of human likeness on speech using our data.
“Robot-directed speech” désigne la parole adressée à un appareil robotique, des petites enceintes domestiques aux robots humanoïdes grandeur-nature. Les études passées ont analysé les propriétés phonétiques et linguistiques de ce type de parole ou encore l’effet de l’anthropomorphisme des appareils sur la sociabilité des interactions, mais l’effet de l’anthropomorphisme sur les réalisations linguistiques n’a encore jamais été exploré. Notre étude propose de combler ce manque avec l’analyse d’un paramètre phonétique (débit de parole) et d’un paramètre linguistique (fréquence des pauses remplies) sur la parole adressée à l’enceinte vs. au robot humanoïde vs. à l’humain. Les données de 71 francophones natifs indiquent que les énoncés adressés aux humains sont plus longs, plus rapides et plus dysfluents que ceux adressés à l’enceinte et au robot. La parole adressée à l’enceinte et au robot est significativement différente de la parole adressée à l’humain, mais pas l’une de l’autre, indiquant l’existence d’un type particulier de la parole adressée aux machines.
In this paper, we present the methodology of corpus design that will be used to study the comparison of influence between linguistic nudges with positive or negative influences and three conversational agents: robot, smart speaker, and human. We recruited forty-nine participants to form six groups. The conversational agents first asked the participants about their willingness to adopt five ecological habits and invest time and money in ecological problems. The participants were then asked the same questions but preceded by one linguistic nudge with positive or negative influence. The comparison of standard deviation and mean metrics of differences between these two notes (before the nudge and after) showed that participants were mainly affected by nudges with positive influence, even though several nudges with negative influence decreased the average note. In addition, participants from all groups were willing to spend more money than time on ecological problems. In general, our experiment’s early results suggest that a machine agent can influence participants to the same degree as a human agent. A better understanding of the power of influence of different conversational machines and the potential of influence of nudges of different polarities will lead to the development of ethical norms of human-computer interactions.
This article presents a corpus featuring adults playing games in interaction with machine trying to induce laugh. This corpus was collected during Interspeech 2013 in Lyon to study behavioral differences correlated to different personalities and cultures. We first present the collection protocol, then the corpus obtained and finally different quantitative and qualitative measures. Smiles and laughs are types of affect bursts which are defined as short emotional non-speech expressions. Here we correlate smile and laugh with personality traits and cultural background. Our final objective is to propose a measure of engagement deduced from those affect bursts.
This article presents a corpus featuring children playing games in interaction with the humanoid robot Nao: children have to express emotions in the course of a storytelling by the robot. This corpus was collected to design an affective interactive system driven by an interactional and emotional representation of the user. We evaluate here some mid-level markers used in our system: reaction time, speech duration and intensity level. We also question the presence of affect bursts, which are quite numerous in our corpus, probably because of the young age of the children and the absence of predefined lexical content.
In this paper we describe a corpus set together from two sub-corpora. The CINEMO corpus contains acted emotional expression obtained by playing dubbing exercises. This new protocol is a way to collect mood-induced data in large amount which show several complex and shaded emotions. JEMO is a corpus collected with an emotion-detection game and contains more prototypical emotions than CINEMO. We show how the two sub-corpora balance and enrich each other and result in a better performance. We built male and female emotion models and use Sequential Fast Forward Feature Selection to improve detection performances. After feature-selection we obtain good results even with our strict speaker independent testing method. The global corpus contains 88 speakers (38 females, 50 males). This study has been done within the scope of the ANR (National Research Agency) Affective Avatar project which deals with building a system of emotions detection for monitoring an Artificial Agent by voice.
The CINEMO corpus of French emotional speech provides a richly annotated resource to help overcome the apparent lack of learning and testing speech material for complex, i.e. blended or mixed emotions. The protocol for its collection was dubbing selected emotional scenes from French movies. 51 speakers are contained and the total speech time amounts to 2 hours and 13 minutes and 4k speech chunks after segmentation. Extensive labelling was carried out in 16 categories for major and minor emotions and in 6 continuous dimensions. In this contribution we give insight into the corpus statistics focusing in particular on the topic of complex emotions, and provide benchmark recognition results obtained in exemplary large feature space evaluations. In the result the labelling oft he collected speech clearly demonstrates that a complex handling of emotion seems needed. Further, the automatic recognition experiments provide evidence that the automatic recognition of blended emotions appears to be feasible.
The modelling of realistic emotional behaviour is needed for various applications in multimodal human-machine interaction such as the design of emotional conversational agents (Martin et al., 2005) or of emotional detection systems (Devillers and Vidrascu, 2007). Yet, building such models requires appropriate definition of various levels for representing the emotions themselves but also some contextual information such as the events that elicit these emotions. This paper presents a coding scheme that has been defined following annotations of a corpus of TV interviews (EmoTV). Deciding which events triggered or may trigger which emotion is a challenge for building efficient emotion eliciting protocols. In this paper, we present the protocol that we defined for collecting another corpus of spontaneous human-human interactions recorded in laboratory conditions (EmoTaboo). We discuss the events that we designed for eliciting emotions. Part of this scheme for coding emotional event is being included in the specifications that are currently defined by a working group of the W3C (the W3C Emotion Incubator Working group). This group is investigating the feasibility of working towards a standard representation of emotions and related states in technological contexts.
The present research focuses on annotation issues in the context of the acoustic detection of fear-type emotions for surveillance applications. The emotional speech material used for this study comes from the previously collected SAFE Database (Situation Analysis in a Fictional and Emotional Database) which consists of audio-visual sequences extracted from movie fictions. A generic annotation scheme was developed to annotate the various emotional manifestations contained in the corpus. The annotation was carried out by two labellers and the two annotations strategies are confronted. It emerges that the borderline between emotion and neutral vary according to the labeller. An acoustic validation by a third labeller allows at analysing the two strategies. Two human strategies are then observed: a first one, context-oriented which mixes audio and contextual (video) information in emotion categorization; and a second one, based mainly on audio information. The k-means clustering confirms the role of audio cues in human annotation strategies. It particularly helps in evaluating those strategies from the point of view of a detection system based on audio cues.
A major barrier to the development of accurate and realistic models of human emotions is the absence of multi-cultural / multilingual databases of real-life behaviours and of a federative and reliable annotation protocol. QUB and LIMSI teams are working towards the definition of an integrated coding scheme combining their complementary approaches. This multilevel integrated scheme combines the dimensions that appear to be useful for the study of real-life emotions: verbal labels, abstract dimensions and contextual (appraisal based) annotations. This paper describes this integrated coding scheme, a protocol that was set-up for annotating French and English video clips of emotional interviews and the results (e.g. inter-coder agreement measures and subjective evaluation of the scheme).
There has been a lot of psychological researches on emotion and nonverbal communication. Yet, these studies were based mostly on acted basic emotions. This paper explores how manual annotation and image processing can cooperate towards the representation of spontaneous emotional behaviour in low resolution videos from TV. We describe a corpus of TV interviews and the manual annotations that have been defined. We explain the image processing algorithms that have been designed for the automatic estimation of movement quantity. Finally, we explore how image processing can be used for the validation of manual annotations.
Research on emotional real-life data has to tackle the problem of their annotation. The annotation of emotional corpora raises the issue of how different coders perceive the same multimodal emotional behaviour. The long-term goal of this paper is to produce a guideline for the selection of annotators. The LIMSI team is working towards the definition of a coding scheme integrating emotion, context and multimodal annotations. We present the current defined coding scheme for emotion annotation, and the use of soft vectors for representing a mixture of emotions. This paper describes a perceptive test of emotion annotations and the results obtained with 40 different coders on a subset of complex real-life emotional segments selected from the EmoTV Corpus collected at LIMSI. The results of this first study validate previous annotations of emotion mixtures and highlight the difference of annotation between male and female coders.
The aim of the MEDIA project is to design and test a methodology for the evaluat ion of context-dependent and independent spoken dialogue systems. We propose an evaluation paradigm based on the use of test suites from real-world corpora and a common semantic representation and common metrics. This paradigm should allow us to diagnose the context-sensitive understanding capability of dialogue system s. This paradigm will be used within an evaluation campaign involving several si tes all of which will carry out the task of querying information from a database .