Dialogue & Discourse (2019)


up

bib (full) Dialogue Discourse Volume 10

Everyday communication is enriched by the visual environment that listeners concomitantly link to the linguistic input. If and when visual cues are integrated into the mental meaning representation of the communicative setting, is still unclear. In our earlier findings, the integration of linguistic cues (i.e., topic-hood of a discourse referent) reduced discourse updating costs of the mental representation as indicated by reduced sentence-initial processing costs of the non-canonical word order in German. In the present study we tried to replicate our earlier findings by replacing the linguistic cue by a visual attention-capture cue presented below the threshold of perception in order to direct participant’s attention to a depicted referent. While this type of cue has previously been shown to modulate word order preferences in sentence production, we found no effects on sentence comprehension. We discuss possible theory-based reasons for the null effect of the implicit visual cue as well as methodological caveats and issues that should be considered in future research on multimodal meaning integration.
We examined the predictive value of wait signals for sarcasm in online debate forums. In a corpus comparison we examined the word frequency of um and uh across six corpora. In general, there were far more fillers in spoken corpora than written corpora. We also found that the proportion of ums to uhs varied by corpus type. In Experiment 1 we tested whether the inclusion of um or uh at the beginning of online debate forum posts led to higher probability of those posts being classified as sarcastic by Amazon Mechanical Turk workers. We found that posts beginning with these items were twice as likely to be labeled sarcastic. In Experiment 2 we tested fillers and ellipses in the middle of posts. We found that posts including these items were approximately three to five times more likely to be labeled sarcastic. We compared results to other signals like the word obviously and quotation marks. Signals that indicate delay in written communication cue readers to non-literal meaning.
How does thematic role predictability affect reference production? This study tests a planning facilitation hypothesis – that the predictability effect on reference form can be explained in terms of the time course of utterance planning. In a discourse production task, participants viewed two sequential event pictures, listened to a description of the first picture (depicting a transfer event between two characters), and then provided a description of the second picture (continuing with one thematic role character, either goal or source). We replicated previous findings that goal continuations lead to more reduced forms of reference and shorter latency to begin speaking than source continuations. Additionally, we tracked speakers’ eye movements in two periods of utterance planning, early vs. late. We found that 1) early pre-planning supports the use of reduced forms but is not affected by thematic role; 2) thematic role only affects late planning; and 3) in contrast with our hypothesis, planning does not account for predictability effects on reduced forms. We then speculate that discourse connectedness drives the thematic role predictability effect on reference form choice.
The Cognitive approach to Coherence Relations (Sanders, Spooren, & Noordman, 1992) was originally proposed as a set of cognitively plausible primitives to order coherence relations, but is also increasingly used as a discourse annotation scheme. This paper provides an overview of new CCR distinctions that have been proposed over the years, summarizes the most important discussions about the operationalization of the primitives, and introduces a new distinction (disjunction) to the taxonomy to improve the descriptive adequacy of CCR. In addition, it reflects on the use of the CCR as an annotation scheme in practice. The overall aim of the paper is to provide an overview of state-of-the-art CCR for discourse annotation that can form, together with the original 1992 proposal, a comprehensive starting point for anyone interested in annotating discourse using CCR.
Discourse-annotated corpora are an important resource for the community, but they are often annotated according to different frameworks. This makes joint usage of the annotations difficult, preventing researchers from searching the corpora in a unified way, or using all annotated data jointly to train computational systems. Several theoretical proposals have recently been made for mapping the relational labels of different frameworks to each other, but these proposals have so far not been validated against existing annotations. The two largest discourse relation annotated resources, the Penn Discourse Treebank and the Rhetorical Structure Theory Discourse Treebank, have however been annotated on the same texts, allowing for a direct comparison of the annotation layers. We propose a method for automatically aligning the discourse segments, and then evaluate existing mapping proposals by comparing the empirically observed against the proposed mappings. Our analysis highlights the influence of segmentation on subsequent discourse relation labelling, and shows that while agreement between frameworks is reasonable for explicit relations, agreement on implicit relations is low. We identify several sources of systematic discrepancies between the two annotation schemes and discuss consequences for future annotation and for usage of the existing resources.
Storytelling is an integral part of daily life and a key part of how we share information and connect with others. The ability to use Natural Language Generation (NLG) to produce stories that are tailored and adapted to the individual reader could have large impact in many different applications. However, one reason that this has not become a reality to date is the NLG story gap, a disconnect between the plan-type representations that story generation engines produce, and the linguistic representations needed by NLG engines. Here we describe Fabula Tales, a storytelling system supporting both story generation and NLG. With manual annotation of texts from existing stories using an intuitive user interface, Fabula Tales automatically extracts the underlying story representation and its accompanying syntactically grounded representation. Narratological and sentence planning parameters are applied to these structures to generate different versions of the story. We show how our storytelling system can alter the story at the sentence level, as well as the discourse level. We also show that our approach can be applied to different kinds of stories by testing our approach on both Aesop’s Fables and first-person blogs posted on social media. The content and genre of such stories varies widely, supporting our claim that our approach is general and domain independent. We then conduct several user studies to evaluate the generated story variations and show that Fabula Tales’ automatically produced variations are perceived as more immediate, interesting, and correct, and are preferred to a baseline generation system that does not use narrative parameters.
Gestures that co-occur with speech are a fundamental component of communication. Prior research with children suggests that gestures may help them to resolve certain forms of lexical ambiguity, including homophones. To test this idea in the context of human-robot interaction, the effects of iconic and deictic gestures on the understanding of homophones was assessed in an experiment where a humanoid robot told a short story containing pairs of homophones to small groups of young participants, accompanied by either expressive gestures or no gestures. Both groups of subjects completed a pretest and post-test to measure their ability to discriminate between pairs of homophones and we calculated aggregated precision. The results show that the use of iconic and deictic gestures aids in general understanding of homophones, providing additional evidence for the importance of gesture to the development of children’s language and communication skills.
Following some recent propositions to handle natural language generation in spoken dialogue systems with long short-term memory recurrent neural network models Wen2016a we first investigate a variant thereof with the objective of a better integration of the attention subnetwork. Then our next objective is to propose and evaluate a framework to adapt the NLG module online through direct interactions with the users. When doing so the basic way is to ask the user to utter an alternative sentence to express a particular dialogue act. But then the system has to decide between using an automatic transcription or to ask for a manual transcription. To do so a reinforcement learning approach based on an adversarial bandit scheme is retained. We show that by defining appropriately the rewards as a linear combination of expected payoffs and costs of acquiring the new data provided by the user, a system design can balance between improving the system’s performance towards a better match with the user’s preferences and the burden associated with it. Then the actual benefits of this system is assessed with a human evaluation, showing that the addition of more diverse utterances allows to produce sentences more satisfying for the user.
This paper describes how Rhetorical Structure Theory (RST) and relational propositions can be used to define a method for rendering and analyzing texts as expressions in propositional logic. Relational propositions, the implicit assertions that correspond to RST relations, are defined using standard logical operators and rules of inference. The resulting logical forms are used to construct logical expressions that map to RST tree structures. The resulting expressions show that inference is pervasive within coherent texts. To support reasoning over these expressions, a set of rules for negation is defined. The logical forms and their negation rules can be used to examine the flow of reasoning and the effects of incoherence. Because there is a correspondence between logical coherence and the functional relationships of RST, an RST analysis that cannot pass the test of logic is indicative either of a problematic analysis or of an incoherent text. The result is a method for analyzing the logic implicit within discursive reasoning.
This paper presents an approach to flexible and adaptive dialogue management driven by cognitive modelling of human dialogue behaviour. Artificial intelligent agents, based on the ACT-R cognitive architecture, together with human actors are participating in a (meta)cognitive skills training within a negotiation scenario. The agent employs instance-based learning to decide about its own actions and to reflect on the behaviour of the opponent. We show that task-related actions can be handled by a cognitive agent who is a plausible dialogue partner. Separating task-related and dialogue control actions enables the application of sophisticated models along with a flexible architecture in which various alternative modelling methods can be combined. We evaluated the proposed approach with users assessing the relative contribution of various factors to the overall usability of a dialogue system. Subjective perception of effectiveness, efficiency and satisfaction were correlated with various objective performance metrics, e.g. number of (in)appropriate system responses, recovery strategies, and interaction pace. It was observed that the dialogue system usability is determined most by the quality of agreements reached in terms of estimated Pareto optimality, by the user’s negotiation strategies selected, and by the quality of system recognition, interpretation and responses. We compared human-human and human-agent performance with respect to the number and quality of agreements reached, estimated cooperativeness level, and frequency of accepted negative outcomes. Evaluation experiments showed promising, consistently positive results throughout the range of the relevant scales.