Bonnie Webber

Also published as: Bonnie Lynn Webber, B.L. Nash-Webber, B. Webber, Bonnie L. Webber

2025

pdf bib abs
Culture Matters in Toxic Language Detection in Persian
Zahra Bokaei | Walid Magdy | Bonnie Webber
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Toxic language detection is crucial for creating safer online environments and limiting the spread of harmful content. While toxic language detection has been under-explored in Persian, the current work compares different methods for this task, including fine-tuning, data enrichment, zero-shot and few-shot learning, and cross-lingual transfer learning. What is especially compelling is the impact of cultural context on transfer learning for this task: We show that the language of a country with cultural similarities to Persian yields better results in transfer learning. Conversely, the improvement is lower when the language comes from a culturally distinct country.

pdf bib abs
“Otherwise” in Context: Exploring Discourse Functions with Language Models
Guifu Liu | Bonnie Webber | Hannah Rohde
Proceedings of the 6th Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences (CODI 2025)

Discourse adverbials are key features of discourse coherence, but their function is often ambiguous. In this work, we investigate how the discourse function of otherwise varies in different contexts. We revise the function set in Rohde et al. (2018b) to account for a new meaning we have encountered. In turn, we create the “otherwise” corpus, a dataset of naturally occurring passages annotated for discourse functions, and identify lexical signals that make a function available with a corpus study. We define continuation acceptability, a metric based on surprisal to probe language models for what they take the function of otherwise to be in a given context. Our experiments show that one can improve function inference by focusing solely on tokens up to and including the head verb of the continuation (i.e., otherwise clause) that have the most varied surprisal across function-disambiguating discourse markers. Lastly, we observe that some of these tokens confirm lexical signals we found in our earlier corpus study, which provides some promising evidence to motivate future pragmatic studies in language models

pdf bib abs
Multi-token Mask-filling and Implicit Discourse Relations
Meinan Liu | Yunfang Dong | Xixian Liao | Bonnie Webber
Findings of the Association for Computational Linguistics: EMNLP 2025

Previous work has shown that simple mask-filling can provide useful information about the discourse informativeness of syntactic structures. Dong et al. (2024) first adopted this approach to investigating preposing constructions. The problem with single token mask fillers was that they were, by and large, ambiguous. We address the issue by adapting the approach of Kalinsky et al. (2023) to support the prediction of multi-token connectives in masked positions. Our first experiment demonstrates that this multi-token mask-filling approach substantially outperforms the previously considered single-token approach in recognizing implicit discourse relations. Our second experiment corroborates previous findings, providing additional empirical support for the role of preposed syntactic constituents in signaling discourse coherence. Overall, our study extends existing mask-filling methods to a new discourse-level task and reinforces the linguistic hypothesis concerning the discourse informativeness of preposed structures.

pdf bib abs
Superlatives in Context: Modeling the Implicit Semantics of Superlatives
Valentina Pyatkin | Bonnie Webber | Ido Dagan | Reut Tsarfaty
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Superlatives are used to single out elements with a maximal/minimal property. Semantically, superlatives perform a set comparison: something (or some things) has the min/max property out of a set. As such, superlatives provide an ideal phenomenon for studying implicit phenomena and discourse restrictions. While this comparison set is often not explicitly defined, its (implicit) restrictions can be inferred from the discourse context the expression appears in. In this work we provide an extensive computational study on the semantics of superlatives. We propose a unified account of superlative semantics which allows us to derive a broad-coverage annotation schema. Using this unified schema we annotated a multi-domain dataset of superlatives and their semantic interpretations. We specifically focus on interpreting implicit or ambiguous superlative expressions, by analyzing how the discourse context restricts the set of interpretations. In a set of experiments we then analyze how well models perform at variations of predicting superlative semantics, with and without context. We show that the fine-grained semantics of superlatives in context can be challenging for contemporary models, including GPT-4.

2024

pdf bib abs
Syntactic Preposing and Discourse Relations
Yunfang Dong | Xixian Liao | Bonnie Webber
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Over 15 years ago, Ward & Birner (2006) suggested that non-canonical constructions in English can serve both to mark information status and to structure the information flow of discourse. One such construction is preposing, where a phrasal constituent appears to the left of its canonical position, typically sentence-initially. But computational work on discourse has, to date, ignored non-canonical syntax. We take account of non-canonical syntax by providing quantitative evidence relating NP/PP preposing to discourse relations. The evidence comes from an LLM mask-filling task that compares the predictions when a mask is inserted between the arguments of an implicit inter-sentential discourse relation — first, when the right-hand argument (Arg2) starts with a preposed constituent, and again, when that constituent is in canonical (post-verbal) position. Results show that (1) the top-ranked mask-fillers in the preposed case agree more often with “gold” annotations in the Penn Discourse TreeBank than they do in the latter case, and (2) preposing in Arg2 can affect the distribution of discourse-relational senses.

pdf bib abs
Multi-Label Classification for Implicit Discourse Relation Recognition
Wanqiu Long | N. Siddharth | Bonnie Webber
Findings of the Association for Computational Linguistics: ACL 2024

Discourse relations play a pivotal role in establishing coherence within textual content, uniting sentences and clauses into a cohesive narrative. The Penn Discourse Treebank (PDTB) stands as one of the most extensively utilized datasets in this domain. In PDTB-3, the annotators can assign multiple labels to an example, when they believe the simultaneous presence of multiple relations. Prior research in discourse relation recognition has treated these instances as separate examples during training, with a gold-standard prediction matching one of the labels considered correct at test time. However, this approach is inadequate, as it fails to account for the interdependence of labels in real-world contexts and to distinguish between cases where only one sense relation holds and cases where multiple relations hold simultaneously. In our work, we address this challenge by exploring various multi-label classification frameworks to handle implicit discourse relation recognition. We show that the methods for multi-label prediction don’t depress performance for single-label prediction. Additionally, we give comprehensive analysis of results and data. Our work contributes to advancing the understanding and application of discourse relations and provide a foundation for the future study.

2023

pdf bib abs
Annotation Error Detection: Analyzing the Past and Present for a More Coherent Future
Jan-Christoph Klie | Bonnie Webber | Iryna Gurevych
Computational Linguistics, Volume 49, Issue 1 - March 2023

Annotated data is an essential ingredient in natural language processing for training and evaluating machine learning models. It is therefore very desirable for the annotations to be of high quality. Recent work, however, has shown that several popular datasets contain a surprising number of annotation errors or inconsistencies. To alleviate this issue, many methods for annotation error detection have been devised over the years. While researchers show that their approaches work well on their newly introduced datasets, they rarely compare their methods to previous work or on the same datasets. This raises strong concerns on methods’ general performance and makes it difficult to assess their strengths and weaknesses. We therefore reimplement 18 methods for detecting potential annotation errors and evaluate them on 9 English datasets for text classification as well as token and span labeling. In addition, we define a uniform evaluation setup including a new formalization of the annotation error detection task, evaluation protocol, and general best practices. To facilitate future research and reproducibility, we release our datasets and implementations in an easy-to-use and open source software package.1

pdf bib abs
A Joint Matrix Factorization Analysis of Multilingual Representations
Zheng Zhao | Yftah Ziser | Bonnie Webber | Shay Cohen
Findings of the Association for Computational Linguistics: EMNLP 2023

We present an analysis tool based on joint matrix factorization for comparing latent representations of multilingual and monolingual models. An alternative to probing, this tool allows us to analyze multiple sets of representations in a joint manner. Using this tool, we study to what extent and how morphosyntactic features are reflected in the representations learned by multilingual pre-trained models. We conduct a large-scale empirical study of over 33 languages and 17 morphosyntactic categories. Our findings demonstrate variations in the encoding of morphosyntactic information across upper and lower layers, with category-specific differences influenced by language properties. Hierarchical clustering of the factorization outputs yields a tree structure that is related to phylogenetic trees manually crafted by linguists. Moreover, we find the factorization outputs exhibit strong associations with performance observed across different cross-lingual tasks. We release our code to facilitate future research.

Translating literary works has perennially stood as an elusive dream in machine translation (MT), a journey steeped in intricate challenges. To foster progress in this domain, we hold a new shared task at WMT 2023, the first edition of the Discourse-Level Literary Translation. First, we (Tencent AI Lab and China Literature Ltd.) release a copyrighted and document-level Chinese-English web novel corpus. Furthermore, we put forth an industry-endorsed criteria to guide human evaluation process. This year, we totally received 14 submissions from 7 academia and industry teams. We employ both automatic and human evaluations to measure the performance of the submitted systems. The official ranking of the systems is based on the overall human judgments. In addition, our extensive analysis reveals a series of interesting findings on literary and discourse-aware MT. We release data, system outputs, and leaderboard at http://www2.statmt.org/wmt23/literary-translation-task.html.

2022

pdf bib abs
Facilitating Contrastive Learning of Discourse Relational Senses by Exploiting the Hierarchy of Sense Relations
Wanqiu Long | Bonnie Webber
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Implicit discourse relation recognition is a challenging task that involves identifying the sense or senses that hold between two adjacent spans of text, in the absense of an explicit connective between them. In both PDTB-2 (prasad et al., 2008) and PDTB-3 (Webber et al., 2019), discourse relational senses are organized into a three-level hierarchy ranging from four broad top-level senses, to more specific senses below them. Most previous work on implicitf discourse relation recognition have used the sense hierarchy simply to indicate what sense labels were available. Here we do more — incorporating the sense hierarchy into the recognition process itself and using it to select the negative examples used in contrastive learning. With no additional effort, the approach achieves state-of-the-art performance on the task. Our code is released inhttps://github.com/wanqiulong 0923/Contrastive_IDRR.

pdf bib abs
Automatically Discarding Straplines to Improve Data Quality for Abstractive News Summarization
Amr Keleg | Matthias Lindemann | Danyang Liu | Wanqiu Long | Bonnie L. Webber
Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP

Recent improvements in automatic news summarization fundamentally rely on large corpora of news articles and their summaries. These corpora are often constructed by scraping news websites, which results in including not only summaries but also other kinds of texts. Apart from more generic noise, we identify straplines as a form of text scraped from news websites that commonly turn out not to be summaries. The presence of these non-summaries threatens the validity of scraped corpora as benchmarks for news summarization. We have annotated extracts from two news sources that form part of the Newsroom corpus (Grusky et al., 2018), labeling those which were straplines, those which were summaries, and those which were both. We present a rule-based strapline detection method that achieves good performance on a manually annotated test set. Automatic evaluation indicates that removing straplines and noise from the training data of a news summarizer results in higher quality summaries, with improvements as high as 7 points ROUGE score.

2021

pdf bib abs
Kathy McKeown Interviews Bonnie Webber
Bonnie Webber
Computational Linguistics, Volume 47, Issue 1 - March 2021

Because the 2020 ACL Lifetime Achievement Award presentation could not be done in person, we replaced the usual LTA talk with an interview between Professor Kathy McKeown (Columbia University) and the recipient, Bonnie Webber. The following is an edited version of the interview, with added citations.

pdf bib abs
Revisiting Shallow Discourse Parsing in the PDTB-3: Handling Intra-sentential Implicits
Zheng Zhao | Bonnie Webber
Proceedings of the 2nd Workshop on Computational Approaches to Discourse

In the PDTB-3, several thousand implicit discourse relations were newly annotated within individual sentences, adding to the over 15,000 implicit relations annotated across adjacent sentences in the PDTB-2. Given that the position of the arguments to these intra-sentential implicits is no longer as well-defined as with inter-sentential implicits, a discourse parser must identify both their location and their sense. That is the focus of the current work. The paper provides a comprehensive analysis of our results, showcasing model performance under different scenarios, pointing out limitations and noting future directions.

pdf bib abs
Refocusing on Relevance: Personalization in NLG
Shiran Dudy | Steven Bedrick | Bonnie Webber
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Many NLG tasks such as summarization, dialogue response, or open domain question answering, focus primarily on a source text in order to generate a target response. This standard approach falls short, however, when a user’s intent or context of work is not easily recoverable based solely on that source text– a scenario that we argue is more of the rule than the exception. In this work, we argue that NLG systems in general should place a much higher level of emphasis on making use of additional context, and suggest that relevance (as used in Information Retrieval) be thought of as a crucial tool for designing user-oriented text-generating tasks. We further discuss possible harms and hazards around such personalization, and argue that value-sensitive design represents a crucial path forward through these challenges.

pdf bib abs
Frustratingly Simple but Surprisingly Strong: Using Language-Independent Features for Zero-shot Cross-lingual Semantic Parsing
Jingfeng Yang | Federico Fancellu | Bonnie Webber | Diyi Yang
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

The availability of corpora has led to significant advances in training semantic parsers in English. Unfortunately, for languages other than English, annotated data is limited and so is the performance of the developed parsers. Recently, pretrained multilingual models have been proven useful for zero-shot cross-lingual transfer in many NLP tasks. What else does it require to apply a parser trained in English to other languages for zero-shot cross-lingual semantic parsing? Will simple language-independent features help? To this end, we experiment with six Discourse Representation Structure (DRS) semantic parsers in English, and generalize them to Italian, German and Dutch, where there are only a small number of manually annotated parses available. Extensive experiments show that despite its simplicity, adding Universal Dependency (UD) relations and Universal POS tags (UPOS) as model-agnostic features achieves surprisingly strong improvement on all parsers.

2020

pdf bib abs
Extending Implicit Discourse Relation Recognition to the PDTB-3
Li Liang | Zheng Zhao | Bonnie Webber
Proceedings of the First Workshop on Computational Approaches to Discourse

The PDTB-3 contains many more Implicit discourse relations than the previous PDTB-2. This is in part because implicit relations have now been annotated within sentences as well as between them. In addition, some now co-occur with explicit discourse relations, instead of standing on their own. Here we show that while this can complicate the problem of identifying the location of implicit discourse relations, it can in turn simplify the problem of identifying their senses. We present data to support this claim, as well as methods that can serve as a non-trivial baseline for future state-of-the-art recognizers for implicit discourse relations.

pdf bib
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Bonnie Webber | Trevor Cohn | Yulan He | Yang Liu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib abs
TED-CDB: A Large-Scale Chinese Discourse Relation Dataset on TED Talks
Wanqiu Long | Bonnie Webber | Deyi Xiong
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

As different genres are known to differ in their communicative properties and as previously, for Chinese, discourse relations have only been annotated over news text, we have created the TED-CDB dataset. TED-CDB comprises a large set of TED talks in Chinese that have been manually annotated according to the goals and principles of Penn Discourse Treebank, but adapted to features that are not present in English. It serves as a unique Chinese corpus of spoken discourse. Benchmark experiments show that TED-CDB poses a challenge for state-of-the-art discourse relation classifiers, whose F1 performance on 4-way classification is 60%. This is a dramatic drop of 35% from performance on the news text in the Chinese Discourse Treebank. Transfer learning experiments have been carried out with the TED-CDB for both same-language cross-domain transfer and same-domain cross-language transfer. Both demonstrate that the TED-CDB can improve the performance of systems being developed for languages other than Chinese and would be helpful for insufficient or unbalanced data in other corpora. The dataset and our Chinese annotation guidelines will be made freely available.

pdf bib abs
Reducing Quantity Hallucinations in Abstractive Summarization
Zheng Zhao | Shay B. Cohen | Bonnie Webber
Findings of the Association for Computational Linguistics: EMNLP 2020

It is well-known that abstractive summaries are subject to hallucination—including material that is not supported by the original text. While summaries can be made hallucination-free by limiting them to general phrases, such summaries would fail to be very informative. Alternatively, one can try to avoid hallucinations by verifying that any specific entities in the summary appear in the original text in a similar context. This is the approach taken by our system, Herman. The system learns to recognize and verify quantity entities (dates, numbers, sums of money, etc.) in a beam-worth of abstractive summaries produced by state-of-the-art models, in order to up-rank those summaries whose quantity terms are supported by the original text. Experimental results demonstrate that the ROUGE scores of such up-ranked summaries have a higher Precision than summaries that have not been up-ranked, without a comparable loss in Recall, resulting in higher F1. Preliminary human evaluation of up-ranked vs. original summaries shows people’s preference for the former.

pdf bib abs
Bridging Question Answering and Discourse The case of Multi-Sentence Questions
Bonnie Webber
Proceedings of the Second International Workshop of Discourse Processing

In human question-answering (QA), questions are often expressed in the form of multiple sentences. One can see this in both spoken QA interactions, when one person asks a question of another, and written QA, such as are found on-line in FAQs and in what are called ”Community Question-Answering Forums”. Computer-based QA has taken the challenge of these ”multi-sentence questions” to be that of breaking them into an appropriately ordered sequence of separate questions, with both the previous questions and their answers serving as context for the next question. This can be seen, for example, in two recent workshops at AAAI called ”Reasoning for Complex QA” [https://rcqa-ws.github.io/program/]. We claim that, while appropriate for some types of ”multi-sentence questions” (MSQs), it is not appropriate for all, because they are essentially different types of discourse. To support this claim, we need to provide evidence that: • different types of MSQs are answered differently in written or spoken QA between people; • people can (and do) distinguish these different types of MSQs; • systems can be made to both distinguish different types of MSQs and provide appropriate answers.

pdf bib abs
Querent Intent in Multi-Sentence Questions
Laurie Burchell | Jie Chi | Tom Hosking | Nina Markl | Bonnie Webber
Proceedings of the 14th Linguistic Annotation Workshop

Multi-sentence questions (MSQs) are sequences of questions connected by relations which, unlike sequences of standalone questions, need to be answered as a unit. Following Rhetorical Structure Theory (RST), we recognise that different “question discourse relations” between the subparts of MSQs reflect different speaker intents, and consequently elicit different answering strategies. Correctly identifying these relations is therefore a crucial step in automatically answering MSQs. We identify five different types of MSQs in English, and define five novel relations to describe them. We extract over 162,000 MSQs from Stack Exchange to enable future research. Finally, we implement a high-precision baseline classifier based on surface features.

pdf bib abs
Shallow Discourse Annotation for Chinese TED Talks
Wanqiu Long | Xinyi Cai | James Reid | Bonnie Webber | Deyi Xiong
Proceedings of the Twelfth Language Resources and Evaluation Conference

Text corpora annotated with language-related properties are an important resource for the development of Language Technology. The current work contributes a new resource for Chinese Language Technology and for Chinese-English translation, in the form of a set of TED talks (some originally given in English, some in Chinese) that have been annotated with discourse relations in the style of the Penn Discourse TreeBank, adapted to properties of Chinese text that are not present in English. The resource is currently unique in annotating discourse-level properties of planned spoken monologues rather than of written text. An inter-annotator agreement study demonstrates that the annotation scheme is able to achieve highly reliable results.

2019

pdf bib abs
GECOR: An End-to-End Generative Ellipsis and Co-reference Resolution Model for Task-Oriented Dialogue
Jun Quan | Deyi Xiong | Bonnie Webber | Changjian Hu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Ellipsis and co-reference are common and ubiquitous especially in multi-turn dialogues. In this paper, we treat the resolution of ellipsis and co-reference in dialogue as a problem of generating omitted or referred expressions from the dialogue context. We therefore propose a unified end-to-end Generative Ellipsis and CO-reference Resolution model (GECOR) in the context of dialogue. The model can generate a new pragmatically complete user utterance by alternating the generation and copy mode for each user utterance. A multi-task learning framework is further proposed to integrate the GECOR into an end-to-end task-oriented dialogue. In order to train both the GECOR and the multi-task learning framework, we manually construct a new dataset on the basis of the public dataset CamRest676 with both ellipsis and co-reference annotation. On this dataset, intrinsic evaluations on the resolution of ellipsis and co-reference show that the GECOR model significantly outperforms the sequence-to-sequence (seq2seq) baseline model in terms of EM, BLEU and F1 while extrinsic evaluations on the downstream dialogue task demonstrate that our multi-task learning framework with GECOR achieves a higher success rate of task completion than TSCP, a state-of-the-art end-to-end task-oriented dialogue model.

pdf bib abs
Classifying Author Intention for Writer Feedback in Related Work
Arlene Casey | Bonnie Webber | Dorota Glowacka
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

The ability to produce high-quality publishable material is critical to academic success but many Post-Graduate students struggle to learn to do so. While recent years have seen an increase in tools designed to provide feedback on aspects of writing, one aspect that has so far been neglected is the Related Work section of academic research papers. To address this, we have trained a supervised classifier on a corpus of 94 Related Work sections and evaluated it against a manually annotated gold standard. The classifier uses novel features pertaining to citation types and co-reference, along with patterns found from studying Related Works. We show that these novel features contribute to classifier performance with performance being favourable compared to other similar works that classify author intentions and consider feedback for academic writing.

pdf bib abs
Ambiguity in Explicit Discourse Connectives
Bonnie Webber | Rashmi Prasad | Alan Lee
Proceedings of the 13th International Conference on Computational Semantics - Long Papers

Discourse connectives are known to be subject to both usage and sense ambiguity, as has already been discussed in the literature. But discourse connectives are no different from other linguistic expressions in being subject to other types of ambiguity as well. Four are illustrated and discussed here.

pdf bib abs
A Framework for Annotating ‘Related Works’ to Support Feedback to Novice Writers
Arlene Casey | Bonnie Webber | Dorota Glowacka
Proceedings of the 13th Linguistic Annotation Workshop

Understanding what is expected of academic writing can be difficult for novice writers to assimilate, and recent years have seen several automated tools become available to support academic writing. Our work presents a framework for annotating features of the Related Work section of academic writing, that supports writer feedback.

2018

pdf bib abs
Getting to “Hearer-old”: Charting Referring Expressions Across Time
Ieva Staliūnaitė | Hannah Rohde | Bonnie Webber | Annie Louis
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

When a reader is first introduced to an entity, its referring expression must describe the entity. For entities that are widely known, a single word or phrase often suffices. This paper presents the first study of how expressions that refer to the same entity develop over time. We track thousands of person and organization entities over 20 years of New York Times (NYT). As entities move from hearer-new (first introduction to the NYT audience) to hearer-old (common knowledge) status, we show empirically that the referring expressions along this trajectory depend on the type of the entity, and exhibit linguistic properties related to becoming common knowledge (e.g., shorter length, less use of appositives, more definiteness). These properties can also be used to build a model to predict how long it will take for an entity to reach hearer-old status. Our results reach 10-30% absolute improvement over a majority-class baseline.

pdf bib
Obituary: Aravind K. Joshi
Bonnie Webber
Computational Linguistics, Volume 44, Issue 3 - September 2018

pdf bib
Evaluating Machine Translation Performance on Chinese Idioms with a Blacklist Method
Yutong Shao | Rico Sennrich | Bonnie Webber | Federico Fancellu
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
NegPar: A parallel corpus annotated for negation
Qianchu Liu | Federico Fancellu | Bonnie Webber
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib abs
Discourse Coherence: Concurrent Explicit and Implicit Relations
Hannah Rohde | Alexander Johnson | Nathan Schneider | Bonnie Webber
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Theories of discourse coherence posit relations between discourse segments as a key feature of coherent text. Our prior work suggests that multiple discourse relations can be simultaneously operative between two segments for reasons not predicted by the literature. Here we test how this joint presence can lead participants to endorse seemingly divergent conjunctions (e.g., BUT and SO) to express the link they see between two segments. These apparent divergences are not symptomatic of participant naivety or bias, but arise reliably from the concurrent availability of multiple relations between segments – some available through explicit signals and some via inference. We believe that these new results can both inform future progress in theoretical work on discourse coherence and lead to higher levels of performance in discourse parsing.

pdf bib
Discourse Annotation in the PDTB: The Next Generation
Rashmi Prasad | Bonnie Webber | Alan Lee
Proceedings of the 14th Joint ACL-ISO Workshop on Interoperable Semantic Annotation

2017

pdf bib abs
Detecting negation scope is easy, except when it isn’t
Federico Fancellu | Adam Lopez | Bonnie Webber | Hangfeng He
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Several corpora have been annotated with negation scope—the set of words whose meaning is negated by a cue like the word “not”—leading to the development of classifiers that detect negation scope with high accuracy. We show that for nearly all of these corpora, this high accuracy can be attributed to a single fact: they frequently annotate negation scope as a single span of text delimited by punctuation. For negation scopes not of this form, detection accuracy is low and under-sampling the easy training examples does not substantially improve accuracy. We demonstrate that this is partly an artifact of annotation guidelines, and we argue that future negation scope annotation efforts should focus on these more difficult cases.

pdf bib abs
Discourse Relations and Conjoined VPs: Automated Sense Recognition
Valentina Pyatkin | Bonnie Webber
Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics

Sense classification of discourse relations is a sub-task of shallow discourse parsing. Discourse relations can occur both across sentences (inter-sentential) and within sentences (intra-sentential), and more than one discourse relation can hold between the same units. Using a newly available corpus of discourse-annotated intra-sentential conjoined verb phrases, we demonstrate a sequential classification pipeline for their multi-label sense classification. We assess the importance of each feature used in the classification, the feature scope, and what is lost in moving from gold standard manual parses to the output of an off-the-shelf parser.

pdf bib abs
Universal Dependencies to Logical Form with Negation Scope
Federico Fancellu | Siva Reddy | Adam Lopez | Bonnie Webber
Proceedings of the Workshop Computational Semantics Beyond Events and Roles

Many language technology applications would benefit from the ability to represent negation and its scope on top of widely-used linguistic resources. In this paper, we investigate the possibility of obtaining a first-order logic representation with negation scope marked using Universal Dependencies. To do so, we enhance UDepLambda, a framework that converts dependency graphs to logical forms. The resulting UDepLambda¬ is able to handle phenomena related to scope by means of an higher-order type theory, relevant not only to negation but also to universal quantification and other complex semantic phenomena. The initial conversion we did for English is promising, in that one can represent the scope of negation also in the presence of more complex phenomena such as universal quantifiers.

pdf bib abs
Neural Networks for Negation Cue Detection in Chinese
Hangfeng He | Federico Fancellu | Bonnie Webber
Proceedings of the Workshop Computational Semantics Beyond Events and Roles

Negation cue detection involves identifying the span inherently expressing negation in a negative sentence. In Chinese, negative cue detection is complicated by morphological proprieties of the language. Previous work has shown that negative cue detection in Chinese can benefit from specific lexical and morphemic features, as well as cross-lingual information. We show here that they are not necessary: A bi-directional LSTM can perform equally well, with minimal feature engineering. In particular, the use of a character-based model allows us to capture characteristics of negation cues in Chinese using word-embedding information only. Not only does our model performs on par with previous work, further error analysis clarifies what problems remain to be addressed.

pdf bib
Proceedings of the Third Workshop on Discourse in Machine Translation
Bonnie Webber | Andrei Popescu-Belis | Jörg Tiedemann
Proceedings of the Third Workshop on Discourse in Machine Translation

pdf bib
Exploring Substitutability through Discourse Adverbials and Multiple Judgments
Hannah Rohde | Anna Dickinson | Nathan Schneider | Annie Louis | Bonnie Webber
Proceedings of the 12th International Conference on Computational Semantics (IWCS) — Long papers

2016

pdf bib abs
Annotating Discourse Relations with the PDTB Annotator
Alan Lee | Rashmi Prasad | Bonnie Webber | Aravind K. Joshi
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

The PDTB Annotator is a tool for annotating and adjudicating discourse relations based on the annotation framework of the Penn Discourse TreeBank (PDTB). This demo describes the benefits of using the PDTB Annotator, gives an overview of the PDTB Framework and discusses the tool’s features, setup requirements and how it can also be used for adjudication.

pdf bib abs
Inconsistency Detection in Semantic Annotation
Nora Hollenstein | Nathan Schneider | Bonnie Webber
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Inconsistencies are part of any manually annotated corpus. Automatically finding these inconsistencies and correcting them (even manually) can increase the quality of the data. Past research has focused mainly on detecting inconsistency in syntactic annotation. This work explores new approaches to detecting inconsistency in semantic annotation. Two ranking methods are presented in this paper: a discrepancy ranking and an entropy ranking. Those methods are then tested and evaluated on multiple corpora annotated with multiword expressions and supersense labels. The results show considerable improvements in detecting inconsistency candidates over a random baseline. Possible applications of methods for inconsistency detection are improving the annotation procedure as well as the guidelines and correcting errors in completed annotations.

pdf bib
Neural Networks For Negation Scope Detection
Federico Fancellu | Adam Lopez | Bonnie Webber
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Discourse-Annotated Corpus of Conjoined VPs
Bonnie Webber | Rashmi Prasad | Alan Lee | Aravind Joshi
Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016)

pdf bib
Filling in the Blanks in Understanding Discourse Adverbials: Consistency, Conflict, and Context-Dependence in a Crowdsourced Elicitation Task
Hannah Rohde | Anna Dickinson | Nathan Schneider | Christopher N. L. Clark | Annie Louis | Bonnie Webber
Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016)

pdf bib
Proceedings of the Workshop on Uphill Battles in Language Processing: Scaling Early Achievements to Robust Methods
Annie Louis | Michael Roth | Bonnie Webber | Michael White | Luke Zettlemoyer
Proceedings of the Workshop on Uphill Battles in Language Processing: Scaling Early Achievements to Robust Methods

2015

pdf bib
Translating Negation: Induction, Search And Model Errors
Federico Fancellu | Bonnie Webber
Proceedings of the Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf bib
Translating Negation: A Manual Error Analysis
Federico Fancellu | Bonnie Webber
Proceedings of the Second Workshop on Extra-Propositional Aspects of Meaning in Computational Semantics (ExProM 2015)

pdf bib
Proceedings of the Second Workshop on Discourse in Machine Translation
Bonnie Webber | Marine Carpuat | Andrei Popescu-Belis | Christian Hardmeier
Proceedings of the Second Workshop on Discourse in Machine Translation

pdf bib
Analysing ParCor and its Translations by State-of-the-art SMT Systems
Liane Guillou | Bonnie Webber
Proceedings of the Second Workshop on Discourse in Machine Translation

pdf bib
A Maximum Entropy Classifier for Cross-Lingual Pronoun Prediction
Dominikus Wetzel | Adam Lopez | Bonnie Webber
Proceedings of the Second Workshop on Discourse in Machine Translation

pdf bib
Proceedings of the First Workshop on Linking Computational Models of Lexical, Sentential and Discourse-level Semantics
Michael Roth | Annie Louis | Bonnie Webber | Tim Baldwin
Proceedings of the First Workshop on Linking Computational Models of Lexical, Sentential and Discourse-level Semantics

pdf bib
Recovering discourse relations: Varying influence of discourse adverbials
Hannah Rohde | Anna Dickinson | Chris Clark | Annie Louis | Bonnie Webber
Proceedings of the First Workshop on Linking Computational Models of Lexical, Sentential and Discourse-level Semantics

pdf bib
Bridging Sentential and Discourse-level Semantics through Clausal Adjuncts
Rashmi Prasad | Bonnie Webber | Alan Lee | Sameer Pradhan | Aravind Joshi
Proceedings of the First Workshop on Linking Computational Models of Lexical, Sentential and Discourse-level Semantics

2014

pdf bib
Structured and Unstructured Cache Models for SMT Domain Adaptation
Annie Louis | Bonnie Webber
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Applying the semantics of negation to SMT through n-best list re-ranking
Federico Fancellu | Bonnie Webber
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Reflections on the Penn Discourse TreeBank, Comparable Corpora, and Complementary Annotation
Rashmi Prasad | Bonnie Webber | Aravind Joshi
Computational Linguistics, Volume 40, Issue 4 - December 2014

pdf bib abs
ParCor 1.0: A Parallel Pronoun-Coreference Corpus to Support Statistical MT
Liane Guillou | Christian Hardmeier | Aaron Smith | Jörg Tiedemann | Bonnie Webber
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present ParCor, a parallel corpus of texts in which pronoun coreference ― reduced coreference in which pronouns are used as referring expressions ― has been annotated. The corpus is intended to be used both as a resource from which to learn systematic differences in pronoun use between languages and ultimately for developing and testing informed Statistical Machine Translation systems aimed at addressing the problem of pronoun coreference in translation. At present, the corpus consists of a collection of parallel English-German documents from two different text genres: TED Talks (transcribed planned speech), and EU Bookshop publications (written text). All documents in the corpus have been manually annotated with respect to the type and location of each pronoun and, where relevant, its antecedent. We provide details of the texts that we selected, the guidelines and tools used to support annotation and some corpus statistics. The texts in the corpus have already been translated into many languages, and we plan to expand the corpus into these other languages, as well as other genres, in the future.

pdf bib
Discourse for Machine Translation.
Bonnie Webber
Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing

We present an approach to automatically identifying the arguments of discourse connectives based on data from the Penn Discourse Treebank. Of the two arguments of connectives, called Arg1 and Arg2, we focus on Arg1, which has proven more challenging to identify. Our approach employs a sentence-based representation of arguments, and distinguishes ""intra-sentential connectives"", which take both their arguments in the same sentence, from ""inter-sentential connectives"", whose arguments are found in different sentences. The latter are further distinguished by paragraph position into ""ParaInit"" connectives, which appear in a paragraph-initial sentence, and ""ParaNonInit"" connectives, which appear elsewhere. The paper focusses on predicting Arg1 of Inter-sentential ParaNonInit connectives, presenting a set of scope-based filters that reduce the search space for Arg1 from all the previous sentences in the paragraph to a subset of them. For cases where these filters do not uniquely identify Arg1, coreference-based heuristics are employed. Our analysis shows an absolute 3% performance improvement over the high baseline of 83.3% for identifying Arg1 of Inter-sentential ParaNonInit connectives.

pdf bib
Discourse Structure: Theory, Practice and Use
Bonnie Webber | Markus Egg | Valia Kordoni
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

pdf bib
Proceedings of the 2010 Workshop on Biomedical Natural Language Processing
K. Bretonnel Cohen | Dina Demner-Fushman | Sophia Ananiadou | John Pestian | Jun’ichi Tsujii | Bonnie Webber
Proceedings of the 2010 Workshop on Biomedical Natural Language Processing

2009

pdf bib
Genre distinctions for discourse in the Penn TreeBank
Bonnie Webber
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

2008

pdf bib
A Discourse Resource for Turkish: Annotating Discourse Connectives in the METU Corpus
Deniz Zeyrek | Bonnie Webber
Proceedings of the 6th Workshop on Asian Language Resources

We present the second version of the Penn Discourse Treebank, PDTB-2.0, describing its lexically-grounded annotations of discourse relations and their two abstract object arguments over the 1 million word Wall Street Journal corpus. We describe all aspects of the annotation, including (a) the argument structure of discourse relations, (b) the sense annotation of the relations, and (c) the attribution of discourse relations and each of their arguments. We list the differences between PDTB-1.0 and PDTB-2.0. We present representative statistics for several aspects of the annotation in the corpus.

pdf bib
Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Dina Demner-Fushman | Sophia Ananiadou | Kevin Bretonnel Cohen | John Pestian | Jun’ichi Tsujii | Bonnie Webber
Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing

pdf bib
Topic Indexing and Retrieval for Factoid QA
Kisuh Ahn | Bonnie Webber
Coling 2008: Proceedings of the 2nd workshop on Information Retrieval for Question Answering