Angus Addlesee

2022

pdf abs
Securely Capturing People’s Interactions with Voice Assistants at Home: A Bespoke Tool for Ethical Data Collection
Angus Addlesee
Proceedings of the Second Workshop on NLP for Positive Impact (NLP4PI)

Speech production is nuanced and unique to every individual, but today’s Spoken Dialogue Systems (SDSs) are trained to use general speech patterns to successfully improve performance on various evaluation metrics. However, these patterns do not apply to certain user groups - often the very people that can benefit the most from SDSs. For example, people with dementia produce more disfluent speech than the general population. In order to evaluate systems with specific user groups in mind, and to guide the design of such systems to deliver maximum benefit to these users, data must be collected securely. In this short paper we present CVR-SI, a bespoke tool for ethical data collection. Designed for the healthcare domain, we argue that it should also be used in more general settings. We detail how off-the-shelf solutions fail to ensure that sensitive data remains secure and private. We then describe the ethical design and security features of our device, with a full guide on how to build both the hardware and software components of CVR-SI. Our design ensures inclusivity to all researchers in this field, particularly those who are not hardware experts. This guarantees everyone can collect appropriate data for human evaluation ethically, securely, and in a timely manner.

Socially Assistive Robots (SARs) have the potential to play an increasingly important role in a variety of contexts including healthcare, but most existing systems have very limited interactive capabilities. We will demonstrate a robot receptionist that not only supports task-based and social dialogue via natural spoken conversation but is also capable of visually grounded dialogue; able to perceive and discuss the shared physical environment (e.g. helping users to locate personal belongings or objects of interest). Task-based dialogues include check-in, navigation and FAQs about facilities, alongside social features such as chit-chat, access to the latest news and a quiz game to play while waiting. We also show how visual context (objects and their spatial relations) can be combined with linguistic representations of dialogue context, to support visual dialogue and question answering. We will demonstrate the system on a humanoid ARI robot, which is being deployed in a hospital reception area.

2021

pdf bib abs
Incremental Graph-Based Semantics and Reasoning for Conversational AI
Angus Addlesee | Arash Eshghi
Proceedings of the Reasoning and Interaction Conference (ReInAct 2021)

The next generation of conversational AI systems need to: (1) process language incrementally, token-by-token to be more responsive and enable handling of conversational phenomena such as pauses, restarts and self-corrections; (2) reason incrementally allowing meaning to be established beyond what is said; (3) be transparent and controllable, allowing designers as well as the system itself to easily establish reasons for particular behaviour and tailor to particular user groups, or domains. In this short paper we present ongoing preliminary work combining Dynamic Syntax (DS) - an incremental, semantic grammar framework - with the Resource Description Framework (RDF). This paves the way for the creation of incremental semantic parsers that progressively output semantic RDF graphs as an utterance unfolds in real-time. We also outline how the parser can be integrated with an incremental reasoning engine through RDF. We argue that this DS-RDF hybrid satisfies the desiderata listed above, yielding semantic infrastructure that can be used to build responsive, real-time, interpretable Conversational AI that can be rapidly customised for specific user groups such as people with dementia.

Visual Question Answering (VQA) systems are increasingly adept at a variety of tasks, and this technology can be used to assist blind and partially sighted people. To do this, the system’s responses must not only be accurate, but usable. It is also vital for assistive technologies to be designed with a focus on: (1) privacy, as the camera may capture a user’s mail, medication bottles, or other sensitive information; (2) transparency, so that the system’s behaviour can be explained and trusted by users; and (3) controllability, to tailor the system for a particular domain or user group. We have therefore extended a conversational VQA framework, called Aye-saac, with these objectives in mind. Specifically, we gave Aye-saac the ability to answer visual questions in the kitchen, a particularly challenging area for visually impaired people. Our system can now answer questions about quantity, positioning, and system confidence in regards to 299 kitchen objects. Questions about the spatial relations between these objects are particularly helpful to visually impaired people, and our system output more usable answers than other state of the art end-to-end VQA systems.

2020

pdf abs
A Comprehensive Evaluation of Incremental Speech Recognition and Diarization for Conversational AI
Angus Addlesee | Yanchao Yu | Arash Eshghi
Proceedings of the 28th International Conference on Computational Linguistics

Automatic Speech Recognition (ASR) systems are increasingly powerful and more accurate, but also more numerous with several options existing currently as a service (e.g. Google, IBM, and Microsoft). Currently the most stringent standards for such systems are set within the context of their use in, and for, Conversational AI technology. These systems are expected to operate incrementally in real-time, be responsive, stable, and robust to the pervasive yet peculiar characteristics of conversational speech such as disfluencies and overlaps. In this paper we evaluate the most popular of such systems with metrics and experiments designed with these standards in mind. We also evaluate the speaker diarization (SD) capabilities of the same systems which will be particularly important for dialogue systems designed to handle multi-party interaction. We found that Microsoft has the leading incremental ASR system which preserves disfluent materials and IBM has the leading incremental SD system in addition to the ASR that is most robust to speech overlaps. Google strikes a balance between the two but none of these systems are yet suitable to reliably handle natural spontaneous conversations in real-time.

Co-authors

Adrien Fabre 1

Ruben Kruiper 1

Nancie Gunson 1

Daniel Hernández Garcia 1

Weronika Sieińska 1

Christian Dondrup 1

Jose L. Part 1

Angus Addlesee

2022

2021

2020

Co-authors

Venues