Kathryn Conger


2022

pdf
PropBank Comes of Age—Larger, Smarter, and more Diverse
Sameer Pradhan | Julia Bonn | Skatje Myers | Kathryn Conger | Tim O’gorman | James Gung | Kristin Wright-bettner | Martha Palmer
Proceedings of the 11th Joint Conference on Lexical and Computational Semantics

This paper describes the evolution of the PropBank approach to semantic role labeling over the last two decades. During this time the PropBank frame files have been expanded to include non-verbal predicates such as adjectives, prepositions and multi-word expressions. The number of domains, genres and languages that have been PropBanked has also expanded greatly, creating an opportunity for much more challenging and robust testing of the generalization capabilities of PropBank semantic role labeling systems. We also describe the substantial effort that has gone into ensuring the consistency and reliability of the various annotated datasets and resources, to better support the training and evaluation of such systems

2021

pdf
SemLink 2.0: Chasing Lexical Resources
Kevin Stowe | Jenette Preciado | Kathryn Conger | Susan Windisch Brown | Ghazaleh Kazeminejad | James Gung | Martha Palmer
Proceedings of the 14th International Conference on Computational Semantics (IWCS)

The SemLink resource provides mappings between a variety of lexical semantic ontologies, each with their strengths and weaknesses. To take advantage of these differences, the ability to move between resources is essential. This work describes advances made to improve the usability of the SemLink resource: the automatic addition of new instances and mappings, manual corrections, sense-based vectors and collocation information, and architecture built to automatically update the resource when versions of the underlying resources change. These updates improve coverage, provide new tools to leverage the capabilities of these resources, and facilitate seamless updates, ensuring the consistency and applicability of these mappings in the future.

2020

pdf
The Russian PropBank
Sarah Moeller | Irina Wagner | Martha Palmer | Kathryn Conger | Skatje Myers
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper presents a proposition bank for Russian (RuPB), a resource for semantic role labeling (SRL). The motivating goal for this resource is to automatically project semantic role labels from English to Russian. This paper describes frame creation strategies, coverage, and the process of sense disambiguation. It discusses language-specific issues that complicated the process of building the PropBank and how these challenges were exploited as language-internal guidance for consistency and coherence.

2016

pdf
A Corpus of Preposition Supersenses
Nathan Schneider | Jena D. Hwang | Vivek Srikumar | Meredith Green | Abhijit Suresh | Kathryn Conger | Tim O’Gorman | Martha Palmer
Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016)

pdf
Large Multi-lingual, Multi-level and Multi-genre Annotation Corpus
Xuansong Li | Martha Palmer | Nianwen Xue | Lance Ramshaw | Mohamed Maamouri | Ann Bies | Kathryn Conger | Stephen Grimes | Stephanie Strassel
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

High accuracy for automated translation and information retrieval calls for linguistic annotations at various language levels. The plethora of informal internet content sparked the demand for porting state-of-art natural language processing (NLP) applications to new social media as well as diverse language adaptation. Effort launched by the BOLT (Broad Operational Language Translation) program at DARPA (Defense Advanced Research Projects Agency) successfully addressed the internet information with enhanced NLP systems. BOLT aims for automated translation and linguistic analysis for informal genres of text and speech in online and in-person communication. As a part of this program, the Linguistic Data Consortium (LDC) developed valuable linguistic resources in support of the training and evaluation of such new technologies. This paper focuses on methodologies, infrastructure, and procedure for developing linguistic annotation at various language levels, including Treebank (TB), word alignment (WA), PropBank (PB), and co-reference (CoRef). Inspired by the OntoNotes approach with adaptations to the tasks to reflect the goals and scope of the BOLT project, this effort has introduced more annotation types of informal and free-style genres in English, Chinese and Egyptian Arabic. The corpus produced is by far the largest multi-lingual, multi-level and multi-genre annotation corpus of informal text and speech.

2014

pdf
PropBank: Semantics of New Predicate Types
Claire Bonial | Julia Bonn | Kathryn Conger | Jena D. Hwang | Martha Palmer
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This research focuses on expanding PropBank, a corpus annotated with predicate argument structures, with new predicate types; namely, noun, adjective and complex predicates, such as Light Verb Constructions. This effort is in part inspired by a sister project to PropBank, the Abstract Meaning Representation project, which also attempts to capture “who is doing what to whom” in a sentence, but does so in a way that abstracts away from syntactic structures. For example, alternate realizations of a ‘destroying’ event in the form of either the verb ‘destroy’ or the noun ‘destruction’ would receive the same Abstract Meaning Representation. In order for PropBank to reach the same level of coverage and continue to serve as the bedrock for Abstract Meaning Representation, predicate types other than verbs, which have previously gone without annotation, must be annotated. This research describes the challenges therein, including the development of new annotation practices that walk the line between abstracting away from language-particular syntactic facts to explore deeper semantics, and maintaining the connection between semantics and syntactic structures that has proven to be very valuable for PropBank as a corpus of training data for Natural Language Processing applications.