Elisabeth Andre
Also published as: Elisabeth André
2026
MUDiC: A Dataset for Multi-User Dialogue and Collaboration in Chatbot Interaction
Nicolas Wagner | Cristina Luna Jimenez | Elisabeth Andre | Wolfgang Minker | Stefan Ultes
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Nicolas Wagner | Cristina Luna Jimenez | Elisabeth Andre | Wolfgang Minker | Stefan Ultes
Proceedings of the Fifteenth Language Resources and Evaluation Conference
We introduce MUDiC, a novel dataset on task-based multi-user interactions in chatbots. Unlike most traditional dialogue corpora that focus on one-to-one human–chatbot exchanges, this dataset captures conversations involving two human participants engaging with a single system. The data include diverse conversational contexts such as shared group task, user intents, and mechanisms to deal with off-topic talk. MUDiC consists of 1,689 dialogue exchanges between 20 groups and the chatbot. Each session is annotated with user id, interaction turns, and intents and dialogue acts, enabling an analysis of group conversational dynamics. Consequently, the dataset aims to support tasks such as multi-user dialogue modelling, intent disambiguation, and moderation behaviour, which are relevant factors for the design of socially aware chatbots.
Annotating Conversational Phases and Communication Techniques: A Corpus of German Teacher-Parent Counseling Conversations
Tobias Hallmen | Kathrin Gietl | Karoline Hillesheim | Annemarie Friedrich | Elisabeth André
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Tobias Hallmen | Kathrin Gietl | Karoline Hillesheim | Annemarie Friedrich | Elisabeth André
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Teacher-parent conversations are critical for student success, yet teachers often lack structured training in counseling communication skills. We present the first annotated corpus of teacher-parent counseling conversations consisting of 59 German dialogues (approximately 6k sentences, 21k annotations) simulated by prospective elementary school teachers, peers, and professional actors. The corpus features theory-grounded annotations for conversational phases (Beginning, Informational, Argumentative, Decision-Making, Concluding) and communication techniques (Paraphrasing, Verbalizing, Structuring). We provide detailed annotation guidelines operationalizing established counseling pedagogy frameworks for computational analysis. Inter-annotator agreement analysis reveals substantial agreement (Fleiss’ k = 0.669 to 0.724, Krippendorff’s a = 0.666 to 0.735). Our analysis reveals confusion patterns, providing insights into counseling discourse structure. Baseline experiments with BERT-based models and open-source LLMs achieve F1 scores of up to 71% depending on task and model. The corpus, guidelines, and baseline code are publicly available under CC BY-NC-SA 4.0 license, enabling research on automated dialogue analysis and AI-based training tools for teacher education.
Evaluation of Failure Communication Strategies for Trust Repair in Human-AI Collaboration
Stina Klein | Alexandru Wurm | Elisabeth Andre | Matthias Kraus
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Stina Klein | Alexandru Wurm | Elisabeth Andre | Matthias Kraus
Proceedings of the Fifteenth Language Resources and Evaluation Conference
The increasing application of Large Language Models (LLMs) in everyday tasks and at work highlights the crucial importance of trust in human-AI collaboration, particularly when an AI system fails. This paper investigates the effectiveness of failure communication strategies for trust repair in collaborative physical tasks involving a a chat-based AI assistant. A controlled experiment in which participants built LEGO cars guided by an LLM-based AI Assistant was used to evaluate whether findings from trust repair in a virtual environment, such as chatbots, translate to an environment comprising tangible tasks, and whether the timing of trust repair influences the outcome. Results indicate that actively communicating mistakes significantly improves trust compared to a no repair strategy, and that early repair tends to be more effective, indicating that failure communication, independent of the timing, is important for an appropriate calibration of trust.
2025
CarMem: Enhancing Long-Term Memory in LLM Voice Assistants through Category-Bounding
Johannes Kirmayr | Lukas Stappen | Phillip Schneider | Florian Matthes | Elisabeth Andre
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
Johannes Kirmayr | Lukas Stappen | Phillip Schneider | Florian Matthes | Elisabeth Andre
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
In today’s assistant landscape, personalisation enhances interactions, fosters long-term relationships, and deepens engagement. However, many systems struggle with retaining user preferences, leading to repetitive user requests and disengagement. Furthermore, the unregulated and opaque extraction of user preferences in industry applications raises significant concerns about privacy and trust, especially in regions with stringent regulations like Europe. In response to these challenges, we propose a long-term memory system for voice assistants, structured around predefined categories. This approach leverages Large Language Models to efficiently extract, store, and retrieve preferences within these categories, ensuring both personalisation and transparency. We also introduce a synthetic multi-turn, multi-session conversation dataset (CarMem), grounded in real industry data, tailored to an in-car voice assistant setting. Benchmarked on the dataset, our system achieves an F1-score of .78 to .95 in preference extraction, depending on category granularity. Our maintenance strategy reduces redundant preferences by 95% and contradictory ones by 92%, while the accuracy of optimal retrieval is at .87. Collectively, the results demonstrate the system’s suitability for industrial applications.
On Speakers’ Identities, Autism Self-Disclosures and LLM-Powered Robots
Sviatlana Hoehn | Fred Philippy | Elisabeth Andre
Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Sviatlana Hoehn | Fred Philippy | Elisabeth Andre
Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Dialogue agents become more engaging through recipient design, which needs user-specific information. However, a user’s identification with marginalized communities, such as migration or disability background, can elicit biased language. This study compares LLM responses to neurodivergent user personas with disclosed vs. masked neurodivergent identities. A dataset built from public Instagram comments was used to evaluate four open-source models on story generation, dialogue generation, and retrieval-augmented question answering. Our analyses show biases in user’s identity construction across all models and tasks. Binary classifiers trained on each model can distinguish between language generated for prompts with or without self-disclosures, with stronger biases linked to more explicit disclosures. Some models’ safety mechanisms result in denial of service behaviors. LLM’s recipient design to neurodivergent identities relies on stereotypes tied to neurodivergence.
2021
AVASAG: A German Sign Language Translation System for Public Services (short paper)
Fabrizio Nunnari | Judith Bauerdiek | Lucas Bernhard | Cristina España-Bonet | Corinna Jäger | Amelie Unger | Kristoffer Waldow | Sonja Wecker | Elisabeth André | Stephan Busemann | Christian Dold | Arnulph Fuhrmann | Patrick Gebhard | Yasser Hamidullah | Marcel Hauck | Yvonne Kossel | Martin Misiak | Dieter Wallach | Alexander Stricker
Proceedings of the 1st International Workshop on Automatic Translation for Signed and Spoken Languages (AT4SSL)
Fabrizio Nunnari | Judith Bauerdiek | Lucas Bernhard | Cristina España-Bonet | Corinna Jäger | Amelie Unger | Kristoffer Waldow | Sonja Wecker | Elisabeth André | Stephan Busemann | Christian Dold | Arnulph Fuhrmann | Patrick Gebhard | Yasser Hamidullah | Marcel Hauck | Yvonne Kossel | Martin Misiak | Dieter Wallach | Alexander Stricker
Proceedings of the 1st International Workshop on Automatic Translation for Signed and Spoken Languages (AT4SSL)
This paper presents an overview of AVASAG; an ongoing applied-research project developing a text-to-sign-language translation system for public services. We describe the scientific innovation points (geometry-based SL-description, 3D animation and video corpus, simplified annotation scheme, motion capture strategy) and the overall translation pipeline.
“It’s our fault!”: Insights Into Users’ Understanding and Interaction With an Explanatory Collaborative Dialog System
Katharina Weitz | Lindsey Vanderlyn | Ngoc Thang Vu | Elisabeth André
Proceedings of the 25th Conference on Computational Natural Language Learning
Katharina Weitz | Lindsey Vanderlyn | Ngoc Thang Vu | Elisabeth André
Proceedings of the 25th Conference on Computational Natural Language Learning
Human-AI collaboration, a long standing goal in AI, refers to a partnership where a human and artificial intelligence work together towards a shared goal. Collaborative dialog allows human-AI teams to communicate and leverage strengths from both partners. To design collaborative dialog systems, it is important to understand what mental models users form about their AI-dialog partners, however, how users perceive these systems is not fully understood. In this study, we designed a novel, collaborative, communication-based puzzle game and explanatory dialog system. We created a public corpus from 117 conversations and post-surveys and used this to analyze what mental models users formed. Key takeaways include: Even when users were not engaged in the game, they perceived the AI-dialog partner as intelligent and likeable, implying they saw it as a partner separate from the game. This was further supported by users often overestimating the system’s abilities and projecting human-like attributes which led to miscommunications. We conclude that creating shared mental models between users and AI systems is important to achieving successful dialogs. We propose that our insights on mental models and miscommunication, the game, and our corpus provide useful tools for designing collaborative dialog systems.
2018
Shaping a social robot’s humor with Natural Language Generation and socially-aware reinforcement learning
Hannes Ritschel | Elisabeth André
Proceedings of the Workshop on NLG for Human–Robot Interaction
Hannes Ritschel | Elisabeth André
Proceedings of the Workshop on NLG for Human–Robot Interaction
Humor is an important aspect in human interaction to regulate conversations, increase interpersonal attraction and trust. For social robots, humor is one aspect to make interactions more natural, enjoyable, and to increase credibility and acceptance. In combination with appropriate non-verbal behavior, natural language generation offers the ability to create content on-the-fly. This work outlines the building-blocks for providing an individual, multimodal interaction experience by shaping the robot’s humor with the help of Natural Language Generation and Reinforcement Learning based on human social signals.
2006
Improving Automatic Emotion Recognition from Speech via Gender Differentiaion
Thurid Vogt | Elisabeth André
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Thurid Vogt | Elisabeth André
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Feature extraction is still a disputed issue for the recognition of emotions from speech. Differences in features for male and female speakers are a well-known problem and it is established that gender-dependent emotion recognizers perform better than gender-independent ones. We propose a way to improve the discriminative quality of gender-dependent features: The emotion recognition system is preceded by an automatic gender detection that decides upon which of two gender-dependent emotion classifiers is used to classify an utterance. This framework was tested on two different databases, one with emotional speech produced by actors and one with spontaneous emotional speech from a Wizard-of-Oz setting. Gender detection achieved an accuracy of about 90 % and the combined gender and emotion recognition system improved the overall recognition rate of a gender-independent emotion recognition system by 2-4 %.
1997
Planning Referential Acts for Animated Presentation Agents
Elisabeth Andre | Thomas Rist
Referring Phenomena in a Multimedia Context and their Computational Treatment
Elisabeth Andre | Thomas Rist
Referring Phenomena in a Multimedia Context and their Computational Treatment
1994
Referring to World Objects With Text and Pictures
Elisabeth Andre | Thomas Rist
COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics
Elisabeth Andre | Thomas Rist
COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics
1991
Search
Fix author
Co-authors
- Thomas Rist 3
- Judith Bauerdiek 1
- Lucas Bernhard 1
- Stephan Busemann 1
- Christian Dold 1
- Cristina España-Bonet 1
- Annemarie Friedrich 1
- Arnulph Fuhrmann 1
- Patrick Gebhard 1
- Kathrin Gietl 1
- Winfried Graf 1
- Tobias Hallmen 1
- Yasser Hamidullah 1
- Marcel Hauck 1
- Karoline Hillesheim 1
- Sviatlana Hoehn 1
- Corinna Jäger 1
- Johannes Kirmayr 1
- Stina Klein 1
- Yvonne Kossel 1
- Matthias Kraus 1
- Cristina Luna Jimenez 1
- Florian Matthes 1
- Wolfgang Minker 1
- Martin Misiak 1
- Fabrizio Nunnari 1
- Fred Philippy 1
- Hannes Ritschel 1
- Phillip Schneider 1
- Lukas Stappen 1
- Alexander Stricker 1
- Stefan Ultes 1
- Amelie Unger 1
- Lindsey Vanderlyn 1
- Thurid Vogt 1
- Ngoc Thang Vu 1
- Nicolas Wagner 1
- Wolfgang Wahlster 1
- Kristoffer Waldow 1
- Dieter Wallach 1
- Sonja Wecker 1
- Katharina Weitz 1
- Alexandru Wurm 1