The next generation of conversational AI systems need to: (1) process language incrementally, token-by-token to be more responsive and enable handling of conversational phenomena such as pauses, restarts and self-corrections; (2) reason incrementally allowing meaning to be established beyond what is said; (3) be transparent and controllable, allowing designers as well as the system itself to easily establish reasons for particular behaviour and tailor to particular user groups, or domains. In this short paper we present ongoing preliminary work combining Dynamic Syntax (DS) - an incremental, semantic grammar framework - with the Resource Description Framework (RDF). This paves the way for the creation of incremental semantic parsers that progressively output semantic RDF graphs as an utterance unfolds in real-time. We also outline how the parser can be integrated with an incremental reasoning engine through RDF. We argue that this DS-RDF hybrid satisfies the desiderata listed above, yielding semantic infrastructure that can be used to build responsive, real-time, interpretable Conversational AI that can be rapidly customised for specific user groups such as people with dementia.
In the visual dialog task GuessWhat?! two players maintain a dialog in order to identify a secret object in an image. Computationally, this is modeled using a question generation module and a guesser module for the questioner role and an answering model, the Oracle, to answer the generated questions. This raises a question: what’s the risk of having an imperfect oracle model?. Here we present work in progress in the study of the impact of different answering models in human generated questions in GuessWhat?!. We show that having access to better quality answers has a direct impact on the guessing task for human dialog and argue that better answers could help train better question generation models.
Starting from an existing account of semantic classification and learning from interaction formulated in a Probabilistic Type Theory with Records, encompassing Bayesian inference and learning with a frequentist flavour, we observe some problems with this account and provide an alternative account of classification learning that addresses the observed problems. The proposed account is also broadly Bayesian in nature but instead uses a linear transformation model for classification and learning.
We present a conversational management act (CMA) annotation schema for one-to-one tutorial dialogue sessions where a tutor uses an analogy to teach a student a concept. CMAs are more fine-grained sub-utterance acts compared to traditional dialogue act mark-up. The schema achieves an inter-annotator agreement (IAA) Cohen Kappa score of at least 0.66 across all 10 classes. We annotate a corpus of analogical episodes with the schema and develop statistical sequence models from the corpus which predict tutor content related decisions, in terms of the selection of the analogical component (AC) and tutor conversational management act (TCMA) to deploy at the current utterance, given the student’s behaviour. CRF sequence classifiers perform well on AC selection and robustly on TCMA selection, achieving respective accuracies of 61.9% and 56.3% on a cross-validation experiment over the corpus.
Visual Question Answering (VQA) systems are increasingly adept at a variety of tasks, and this technology can be used to assist blind and partially sighted people. To do this, the system’s responses must not only be accurate, but usable. It is also vital for assistive technologies to be designed with a focus on: (1) privacy, as the camera may capture a user’s mail, medication bottles, or other sensitive information; (2) transparency, so that the system’s behaviour can be explained and trusted by users; and (3) controllability, to tailor the system for a particular domain or user group. We have therefore extended a conversational VQA framework, called Aye-saac, with these objectives in mind. Specifically, we gave Aye-saac the ability to answer visual questions in the kitchen, a particularly challenging area for visually impaired people. Our system can now answer questions about quantity, positioning, and system confidence in regards to 299 kitchen objects. Questions about the spatial relations between these objects are particularly helpful to visually impaired people, and our system output more usable answers than other state of the art end-to-end VQA systems.
In this paper we will argue that the nature of dogwhistle communication is essentially dialogical, and that to account for dogwhistle meaning we must consider dialogical events in which dialogue partners can draw different conclusions based on communicative events. This leads us to a theory based on inference. However, as identified by Khoo (2017) and emphasised by Henderson & McCready (2018), a problematic aspect of this approach is that expressions that have a similar meaning are analysed as generating the same dogwhistle inferences, which appears not always to be the case. By modelling meaning in terms of intensional types in TTR, we avoid this problem.
The shift to neural models in Referring Expression Generation (REG) has enabled more natural set-ups, but at the cost of interpretability. We argue that integrating pragmatic reasoning into the inference of context-agnostic generation models could reconcile traits of traditional and neural REG, as this offers a separation between context-independent, literal information and pragmatic adaptation to context. With this in mind, we apply existing decoding strategies from discriminative image captioning to REG and evaluate them in terms of pragmatic informativity, likelihood to ground-truth annotations and linguistic diversity. Our results show general effectiveness, but a relatively small gain in informativity, raising important questions for REG in general.
As AI reaches wider adoption, designing systems that are explainable and interpretable becomes a critical necessity. In particular, when it comes to dialogue systems, their reasoning must be transparent and must comply with human intuitions in order for them to be integrated seamlessly into day-to-day collaborative human-machine activities. Here, we describe our ongoing work on a (general purpose) dialogue system equipped with a spatial specialist with explanatory capabilities. We applied this system to a particular task of characterizing spatial configurations of blocks in a simple physical Blocks World (BW) domain using natural locative expressions, as well as generating justifications for the proposed spatial descriptions by indicating the factors that the system used to arrive at a particular conclusion.
In this paper we argue that to make dialogue systems able to actively explain their decisions they can make use of enthymematic reasoning. We motivate why this is an appropriate strategy and integrate it within our own proof-theoretic dialogue manager framework based on linear logic. In particular, this enables a dialogue system to provide reasonable answers to why-questions that query information previously given by the system.