2021
pdf
bib
Proceedings of the Fifth Workshop on Teaching NLP
David Jurgens
|
Varada Kolhatkar
|
Lucy Li
|
Margot Mieskes
|
Ted Pedersen
Proceedings of the Fifth Workshop on Teaching NLP
2018
pdf
bib
abs
Anaphora Resolution with the ARRAU Corpus
Massimo Poesio
|
Yulia Grishina
|
Varada Kolhatkar
|
Nafise Moosavi
|
Ina Roesiger
|
Adam Roussel
|
Fabian Simonjetz
|
Alexandra Uma
|
Olga Uryupina
|
Juntao Yu
|
Heike Zinsmeister
Proceedings of the First Workshop on Computational Models of Reference, Anaphora and Coreference
The ARRAU corpus is an anaphorically annotated corpus of English providing rich linguistic information about anaphora resolution. The most distinctive feature of the corpus is the annotation of a wide range of anaphoric relations, including bridging references and discourse deixis in addition to identity (coreference). Other distinctive features include treating all NPs as markables, including non-referring NPs; and the annotation of a variety of morphosyntactic and semantic mention and entity attributes, including the genericity status of the entities referred to by markables. The corpus however has not been extensively used for anaphora resolution research so far. In this paper, we discuss three datasets extracted from the ARRAU corpus to support the three subtasks of the CRAC 2018 Shared Task–identity anaphora resolution over ARRAU-style markables, bridging references resolution, and discourse deixis; the evaluation scripts assessing system performance on those datasets; and preliminary results on these three tasks that may serve as baseline for subsequent research in these phenomena.
pdf
abs
Survey: Anaphora With Non-nominal Antecedents in Computational Linguistics: a Survey
Varada Kolhatkar
|
Adam Roussel
|
Stefanie Dipper
|
Heike Zinsmeister
Computational Linguistics, Volume 44, Issue 3 - September 2018
This article provides an extensive overview of the literature related to the phenomenon of non-nominal-antecedent anaphora (also known as abstract anaphora or discourse deixis), a type of anaphora in which an anaphor like “that” refers to an antecedent (marked in boldface) that is syntactically non-nominal, such as the first sentence in “It’s way too hot here. That’s why I’m moving to Alaska.” Annotating and automatically resolving these cases of anaphora is interesting in its own right because of the complexities involved in identifying non-nominal antecedents, which typically represent abstract objects such as events, facts, and propositions. There is also practical value in the resolution of non-nominal-antecedent anaphora, as this would help computational systems in machine translation, summarization, and question answering, as well as, conceivably, any other task dependent on some measure of text understanding. Most of the existing approaches to anaphora annotation and resolution focus on nominal-antecedent anaphora, classifying many of the cases where the antecedents are syntactically non-nominal as non-anaphoric. There has been some work done on this topic, but it remains scattered and difficult to collect and assess. With this article, we hope to bring together and synthesize work done in disparate contexts up to now in order to identify fundamental problems and draw conclusions from an overarching perspective. Having a good picture of the current state of the art in this field can help researchers direct their efforts to where they are most necessary. Because of the great variety of theoretical approaches that have been brought to bear on the problem, there is an equally diverse array of terminologies that are used to describe it, so we will provide an overview and discussion of these terminologies. We also describe the linguistic properties of non-nominal-antecedent anaphora, examine previous annotation efforts that have addressed this topic, and present the computational approaches that aim at resolving non-nominal-antecedent anaphora automatically. We close with a review of the remaining open questions in this area and some of our recommendations for future research.
2017
pdf
bib
abs
Constructive Language in News Comments
Varada Kolhatkar
|
Maite Taboada
Proceedings of the First Workshop on Abusive Language Online
We discuss the characteristics of constructive news comments, and present methods to identify them. First, we define the notion of constructiveness. Second, we annotate a corpus for constructiveness. Third, we explore whether available argumentation corpora can be useful to identify constructiveness in news comments. Our model trained on argumentation corpora achieves a top accuracy of 72.59% (baseline=49.44%) on our crowd-annotated test data. Finally, we examine the relation between constructiveness and toxicity. In our crowd-annotated data, 21.42% of the non-constructive comments and 17.89% of the constructive comments are toxic, suggesting that non-constructive comments are not much more toxic than constructive comments.
pdf
abs
Using New York Times Picks to Identify Constructive Comments
Varada Kolhatkar
|
Maite Taboada
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism
We examine the extent to which we are able to automatically identify constructive online comments. We build several classifiers using New York Times Picks as positive examples and non-constructive thread comments from the Yahoo News Annotated Comments Corpus as negative examples of constructive online comments. We evaluate these classifiers on a crowd-annotated corpus containing 1,121 comments. Our best classifier achieves a top F1 score of 0.84.
2014
pdf
Resolving Shell Nouns
Varada Kolhatkar
|
Graeme Hirst
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
2013
pdf
Annotating Anaphoric Shell Nouns with their Antecedents
Varada Kolhatkar
|
Heike Zinsmeister
|
Graeme Hirst
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse
pdf
Interpreting Anaphoric Shell Nouns using Antecedents of Cataphoric Shell Nouns as Training Data
Varada Kolhatkar
|
Heike Zinsmeister
|
Graeme Hirst
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing
2012
pdf
Resolving “This-issue” Anaphora
Varada Kolhatkar
|
Graeme Hirst
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
2009
pdf
WordNet::SenseRelate::AllWords - A Broad Coverage Word Sense Tagger that Maximizes Semantic Relatedness
Ted Pedersen
|
Varada Kolhatkar
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Demonstration Session