Dialogue & Discourse (2018)


up

bib (full) Dialogue Discourse Volume 9

The literature on Romance null-subject languages has often postulated a division of labor between Null and Overt pronouns: Nulls prefer to retrieve an antecedent in subject position, whereas Overts prefer an antecedent in a lower syntactic position (Carminati, 2002). However, recent research on English pronouns (Rohde and Kehler, 2014) has shown grammatical function alone cannot explain pronoun interpretation. According to these models, pronoun interpretation and production are sensitive to different sets of factors and, instead of being mirror images of each other, are related probabilistically in a Bayesian fashion. This paper tests this model with Catalan data from two discourse-completion experiments to study the grammatical and pragmatic factors that affect the interpretation and production of Null and Overt pronouns. Our main result is that both Null and Overt pronouns present asymmetries regarding their interpretation and production: (1) the production of Null pronouns is affected mainly by grammatical factors (they are subject-biased), but their interpretation is also influenced by pragmatic factors (in particular, rhetorical relations), and (2) while Overt pronouns have a strong interpretation bias towards the object, the data indicates that they are not the preferred form to refer to the object.
Corpus-based studies in various languages have demonstrated that some connectives are used preferentially to express subjective versus objective meanings, for example, omdat vs. want in Dutch. However, Spanish connectives have been understudied from this perspective. Moreover, most of the studies of subjectivity have focused on explicit relations and little is known about the subjectivity of implicit coherence relations. In addition, the role that text type plays in the meaning and use of causal relations and their connectives is still under discussion. This study aims to analyze the local contexts of Spanish causal explicit and implicit relations in different text types by carrying out manual analyses of subjectivity. 360 relations marked by three prototypical causal connectives and 120 implicit relations were extracted from academic and journalistic texts. The analytical model applied is based on an integrative approach to subjectivity. Statistical analyses indicate a particular behavior of Spanish connectives and implicit relations and a three-way interaction between subjectivity, text type, and linguistic marking in journalistic texts. Therefore, this study reveals new insights into subjectivity in Spanish discourse.
Languages vary in how they encode and interpret attested information. The present research examined how users of Turkish and English construe utterances containing evidential information, in particular, whether evidential information is interpreted strictly as conveying source information (firsthand, or non-firsthand), or whether it is also perceived as signaling reliability of particular sources. Participants read sentences in their respective language presented in various source and modal forms and were asked to judge the source of information of the proposition and their confidence in whether the asserted event actually happened. It was found that there was sufficient information from evidential and modal expressions to make both source and probability of occurrence judgments, although the groups differed somewhat in their judgment patterns. The findings are taken to suggest that, for both Turkish and English speakers, evidentiality and epistemic modality overlaps to some extent but the two do not function exactly in the same way.
In Discourse Studies concessions are considered among those argumentative strategies that increase persuasion. We aim to empirically test this hypothesis by calculating the distribution of argumentative concessions in persuasive vs. non-persuasive comments from the the ChangeMyView subreddit. This constitutes a challenging task since concessions do not always bear an argumentative role and are expressed through polysemous lexical markers. Drawing from a theoretically-informed typology of concessions, we first conduct a crowdsourcing task to label a set of polysemous lexical markers as introducing an argumentative concession relation or not. Second, we present a self-training method to automatically identify argumentative concessions using linguistically motivated features. While we achieve a moderate F1 of 57.4% via the self-training method, our subsequent error analysis highlights that the self training method is able to generalize and identify other types of concessions that are argumentative, but were not considered in the annotation guidelines. Our findings from the manual labeling and the classification experiments indicate that the type of argumentative concessions we investigated is almost equally likely to be used in winning and losing arguments. While this result seems to contradict theoretical assumptions, we provide some reasons related to the ChangeMyView subreddit.
This paper presents an analysis of discourse markers in two spontaneous speech corpora for European Portuguese - university lectures and map-task dialogues - and also in a collection of tweets, aiming at contributing to their categorization, scarcely existent for European Portuguese. Our results show that the selection of discourse markers is domain and speaker dependent. We also found that the most frequent discourse markers are similar in all three corpora, despite tweets containing discourse markers not found in the other two corpora. In this multidisciplinary study, comprising both a linguistic perspective and a computational approach, discourse markers are also automatically discriminated from other structural metadata events, namely sentence-like units and disfluencies. Our results show that discourse markers and disfluencies tend to co-occur in the dialogue corpus, but have a complementary distribution in the university lectures. We used three acoustic-prosodic feature sets and machine learning to automatically distinguish between discourse markers, disfluencies and sentence-like units. Our in-domain experiments achieved an accuracy of about 87% in university lectures and 84% in dialogues, in line with our previous results. The eGeMAPS features, commonly used for other paralinguistic tasks, achieved a considerable performance on our data, especially considering the small size of the feature set. Our results suggest that turn-initial discourse markers are usually easier to classify than disfluencies, a result also previously reported in the literature. We conducted a cross-domain evaluation in order to evaluate the robustness of the models across domains. The results achieved are about 11%-12% lower, but we conclude that data from one domain can still be used to classify the same events in the other. Overall, despite the complexity of this task, these are very encouraging state-of-the-art results. Ultimately, using exclusively acoustic-prosodic cues, discourse markers can be fairly discriminated from disfluencies and SUs. In order to better understand the contribution of each feature, we have also reported the impact of the features in both the dialogues and the university lectures. Pitch features are the most relevant ones for the distinction between discourse markers and disfluencies, namely pitch slopes. These features are in line with the wide pitch range of discourse markers, in a continuum from a very compressed pitch range to a very wide one, expressed by total deaccented material or H+L* L* contours, with upstep H tones.
Starting from the perspective that discourse structure arises from the presence of coherence relations, we provide a map of linguistic discourse structuring devices (DRDs), and focus on those for written text. We propose to structure these items by differentiating between primary and secondary connectives on the one hand, and free connecting phrases on the other. For the former, we propose that their behavior can be described by lexicons, and we show one concrete proposal that by now has been applied to three languages, with others being added in ongoing work. The lexical representations can be useful both for humans (theoretical investigations, transfer to other languages) and for machines (automatic discourse parsing and generation).
During the past decade, several areas of speech and language understanding have witnessed substantial breakthroughs from the use of data-driven models. In the area of dialogue systems, the trend is less obvious, and most practical systems are still built through significant engineering and expert knowledge. Nevertheless, several recent results suggest that data-driven approaches are feasible and quite promising. To facilitate research in this area, we have carried out a wide survey of publicly available datasets suitable for data-driven learning of dialogue systems. We discuss important characteristics of these datasets, how they can be used to learn diverse dialogue strategies, and their other potential uses. We also examine methods for transfer learning between datasets and the use of external knowledge. Finally, we discuss appropriate choice of evaluation metrics for the learning objective.