2024
pdf
abs
Multi-party Multimodal Conversations Between Patients, Their Companions, and a Social Robot in a Hospital Memory Clinic
Angus Addlesee
|
Neeraj Cherakara
|
Nivan Nelson
|
Daniel Hernandez Garcia
|
Nancie Gunson
|
Weronika Sieińska
|
Christian Dondrup
|
Oliver Lemon
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations
We have deployed an LLM-based spoken dialogue system in a real hospital. The ARI social robot embodies our system, which patients and their companions can have multi-party conversations with together. In order to enable this multi-party ability, multimodality is critical. Our system, therefore, receives speech and video as input, and generates both speech and gestures (arm, head, and eye movements). In this paper, we describe our complex setting and the architecture of our dialogue system. Each component is detailed, and a video of the full system is available with the appropriate components highlighted in real-time. Our system decides when it should take its turn, generates human-like clarification requests when the patient pauses mid-utterance, answers in-domain questions (grounding to the in-prompt knowledge), and responds appropriately to out-of-domain requests (like generating jokes or quizzes). This latter feature is particularly remarkable as real patients often utter unexpected sentences that could not be handled previously.
2023
pdf
abs
Multi-party Goal Tracking with LLMs: Comparing Pre-training, Fine-tuning, and Prompt Engineering
Angus Addlesee
|
Weronika Sieińska
|
Nancie Gunson
|
Daniel Hernandez Garcia
|
Christian Dondrup
|
Oliver Lemon
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue
This paper evaluates the extent to which current LLMs can capture task-oriented multi-party conversations (MPCs). We have recorded and transcribed 29 MPCs between patients, their companions, and a social robot in a hospital. We then annotated this corpus for multi-party goal-tracking and intent-slot recognition. People share goals, answer each other’s goals, and provide other people’s goals in MPCs - none of which occur in dyadic interactions. To understand user goals in MPCs, we compared three methods in zero-shot and few-shot settings: we fine-tuned T5, created pre-training tasks to train DialogLM using LED, and employed prompt engineering techniques with GPT-3.5-turbo, to determine which approach can complete this novel task with limited data. GPT-3.5-turbo significantly outperformed the others in a few-shot setting. The ‘reasoning’ style prompt, when given 7% of the corpus as example annotated conversations, was the best performing method. It correctly annotated 62.32% of the goal tracking MPCs, and 69.57% of the intent-slot recognition MPCs. A ‘story’ style prompt increased model hallucination, which could be detrimental if deployed in safety-critical settings. We conclude that multi-party conversations still challenge state-of-the-art LLMs.
2022
pdf
bib
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue
Oliver Lemon
|
Dilek Hakkani-Tur
|
Junyi Jessy Li
|
Arash Ashrafzadeh
|
Daniel Hernández Garcia
|
Malihe Alikhani
|
David Vandyke
|
Ondřej Dušek
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue
pdf
abs
A Visually-Aware Conversational Robot Receptionist
Nancie Gunson
|
Daniel Hernandez Garcia
|
Weronika Sieińska
|
Angus Addlesee
|
Christian Dondrup
|
Oliver Lemon
|
Jose L. Part
|
Yanchao Yu
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue
Socially Assistive Robots (SARs) have the potential to play an increasingly important role in a variety of contexts including healthcare, but most existing systems have very limited interactive capabilities. We will demonstrate a robot receptionist that not only supports task-based and social dialogue via natural spoken conversation but is also capable of visually grounded dialogue; able to perceive and discuss the shared physical environment (e.g. helping users to locate personal belongings or objects of interest). Task-based dialogues include check-in, navigation and FAQs about facilities, alongside social features such as chit-chat, access to the latest news and a quiz game to play while waiting. We also show how visual context (objects and their spatial relations) can be combined with linguistic representations of dialogue context, to support visual dialogue and question answering. We will demonstrate the system on a humanoid ARI robot, which is being deployed in a hospital reception area.