Frances Yung

2021

pdf bib abs
A practical perspective on connective generation
Frances Yung | Merel Scholman | Vera Demberg
Proceedings of the 2nd Workshop on Computational Approaches to Discourse

In data-driven natural language generation, we typically know what relation should be expressed and need to select a connective to lexicalize it. In the current contribution, we analyse whether a sophisticated connective generation module is necessary to select a connective, or whether this can be solved with simple methods (such as random choice between connectives that are known to express a given relation, or usage of a generic language model). Comparing these methods to the distributions of connective choices from a human connective insertion task, we find mixed results: for some relations, it is acceptable to lexicalize them using any of the connectives that mark this relation. However, for other relations (temporals, concessives) either a more detailed relation distinction needs to be introduced, or a more sophisticated connective choice module would be necessary.

pdf bib abs
Comparison of methods for explicit discourse connective identification across various domains
Merel Scholman | Tianai Dong | Frances Yung | Vera Demberg
Proceedings of the 2nd Workshop on Computational Approaches to Discourse

Existing parse methods use varying approaches to identify explicit discourse connectives, but their performance has not been consistently evaluated in comparison to each other, nor have they been evaluated consistently on text other than newspaper articles. We here assess the performance on explicit connective identification of three parse methods (PDTB e2e, Lin et al., 2014; the winner of CONLL2015, Wang et al., 2015; and DisSent, Nie et al., 2019), along with a simple heuristic. We also examine how well these systems generalize to different datasets, namely written newspaper text (PDTB), written scientific text (BioDRB), prepared spoken text (TED-MDB) and spontaneous spoken text (Disco-SPICE). The results show that the e2e parser outperforms the other parse methods in all datasets. However, performance drops significantly from the PDTB to all other datasets. We provide a more fine-grained analysis of domain differences and connectives that prove difficult to parse, in order to highlight the areas where gains can be made.

2019

pdf bib abs
Acquiring Annotated Data with Cross-lingual Explicitation for Implicit Discourse Relation Classification
Wei Shi | Frances Yung | Vera Demberg
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

Implicit discourse relation classification is one of the most challenging and important tasks in discourse parsing, due to the lack of connectives as strong linguistic cues. A principle bottleneck to further improvement is the shortage of training data (ca. 18k instances in the Penn Discourse Treebank (PDTB)). Shi et al. (2017) proposed to acquire additional data by exploiting connectives in translation: human translators mark discourse relations which are implicit in the source language explicitly in the translation. Using back-translations of such explicitated connectives improves discourse relation parsing performance. This paper addresses the open question of whether the choice of the translation language matters, and whether multiple translations into different languages can be effectively used to improve the quality of the additional data.

pdf bib abs
Crowdsourcing Discourse Relation Annotations by a Two-Step Connective Insertion Task
Frances Yung | Vera Demberg | Merel Scholman
Proceedings of the 13th Linguistic Annotation Workshop

The perspective of being able to crowd-source coherence relations bears the promise of acquiring annotations for new texts quickly, which could then increase the size and variety of discourse-annotated corpora. It would also open the avenue to answering new research questions: Collecting annotations from a larger number of individuals per instance would allow to investigate the distribution of inferred relations, and to study individual differences in coherence relation interpretation. However, annotating coherence relations with untrained workers is not trivial. We here propose a novel two-step annotation procedure, which extends an earlier method by Scholman and Demberg (2017a). In our approach, coherence relation labels are inferred from connectives that workers insert into the text. We show that the proposed method leads to replicable coherence annotations, and analyse the agreement between the obtained relation labels and annotations from PDTB and RSTDT on the same texts.

2018

pdf bib abs
Do Speakers Produce Discourse Connectives Rationally?
Frances Yung | Vera Demberg
Proceedings of the Eight Workshop on Cognitive Aspects of Computational Language Learning and Processing

A number of different discourse connectives can be used to mark the same discourse relation, but it is unclear what factors affect connective choice. One recent account is the Rational Speech Acts theory, which predicts that speakers try to maximize the informativeness of an utterance such that the listener can interpret the intended meaning correctly. Existing prior work uses referential language games to test the rational account of speakers’ production of concrete meanings, such as identification of objects within a picture. Building on the same paradigm, we design a novel Discourse Continuation Game to investigate speakers’ production of abstract discourse relations. Experimental results reveal that speakers significantly prefer a more informative connective, in line with predictions of the RSA model.

2017

pdf bib abs
Using Explicit Discourse Connectives in Translation for Implicit Discourse Relation Classification
Wei Shi | Frances Yung | Raphael Rubino | Vera Demberg
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Implicit discourse relation recognition is an extremely challenging task due to the lack of indicative connectives. Various neural network architectures have been proposed for this task recently, but most of them suffer from the shortage of labeled data. In this paper, we address this problem by procuring additional training data from parallel corpora: When humans translate a text, they sometimes add connectives (a process known as explicitation). We automatically back-translate it into an English connective and use it to infer a label with high confidence. We show that a training set several times larger than the original training set can be generated this way. With the extra labeled instances, we show that even a simple bidirectional Long Short-Term Memory Network can outperform the current state-of-the-art.

pdf bib abs
Can Discourse Relations be Identified Incrementally?
Frances Yung | Hiroshi Noji | Yuji Matsumoto
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Humans process language word by word and construct partial linguistic structures on the fly before the end of the sentence is perceived. Inspired by this cognitive ability, incremental algorithms for natural language processing tasks have been proposed and demonstrated promising performance. For discourse relation (DR) parsing, however, it is not yet clear to what extent humans can recognize DRs incrementally, because the latent ‘nodes’ of discourse structure can span clauses and sentences. To answer this question, this work investigates incrementality in discourse processing based on a corpus annotated with DR signals. We find that DRs are dominantly signaled at the boundary between the two constituent discourse units. The findings complement existing psycholinguistic theories on expectation in discourse processing and provide direction for incremental discourse parsing.