2022
pdf
abs
A Study in Contradiction: Data and Annotation for AIDA Focusing on Informational Conflict in Russia-Ukraine Relations
Jennifer Tracey
|
Ann Bies
|
Jeremy Getman
|
Kira Griffitt
|
Stephanie Strassel
Proceedings of the Thirteenth Language Resources and Evaluation Conference
This paper describes data resources created for Phase 1 of the DARPA Active Interpretation of Disparate Alternatives (AIDA) program, which aims to develop language technology that can help humans manage large volumes of sometimes conflicting information to develop a comprehensive understanding of events around the world, even when such events are described in multiple media and languages. Especially important is the need for the technology to be capable of building multiple hypotheses to account for alternative interpretations of data imbued with informational conflict. The corpus described here is designed to support these goals. It focuses on the domain of Russia-Ukraine relations and contains multimedia source data in English, Russian and Ukrainian, annotated to support development and evaluation of systems that perform extraction of entities, events, and relations from individual multimedia documents, aggregate the information across documents and languages, and produce multiple “hypotheses” about what has happened. This paper describes source data collection, annotation, and assessment.
2019
pdf
Corpus Building for Low Resource Languages in the DARPA LORELEI Program
Jennifer Tracey
|
Stephanie Strassel
|
Ann Bies
|
Zhiyi Song
|
Michael Arrigo
|
Kira Griffitt
|
Dana Delgado
|
Dave Graff
|
Seth Kulick
|
Justin Mott
|
Neil Kuster
Proceedings of the 2nd Workshop on Technologies for MT of Low Resource Languages
2018
pdf
Simple Semantic Annotation and Situation Frames: Two Approaches to Basic Text Understanding in LORELEI
Kira Griffitt
|
Jennifer Tracey
|
Ann Bies
|
Stephanie Strassel
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
pdf
Abstract Meaning Representation of Constructions: The More We Include, the Better the Representation
Claire Bonial
|
Bianca Badarau
|
Kira Griffitt
|
Ulf Hermjakob
|
Kevin Knight
|
Tim O’Gorman
|
Martha Palmer
|
Nathan Schneider
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
pdf
abs
AMR Beyond the Sentence: the Multi-sentence AMR corpus
Tim O’Gorman
|
Michael Regan
|
Kira Griffitt
|
Ulf Hermjakob
|
Kevin Knight
|
Martha Palmer
Proceedings of the 27th International Conference on Computational Linguistics
There are few corpora that endeavor to represent the semantic content of entire documents. We present a corpus that accomplishes one way of capturing document level semantics, by annotating coreference and similar phenomena (bridging and implicit roles) on top of gold Abstract Meaning Representations of sentence-level semantics. We present a new corpus of this annotation, with analysis of its quality, alongside a plausible baseline for comparison. It is hoped that this Multi-Sentence AMR corpus (MS-AMR) may become a feasible method for developing rich representations of document meaning, useful for tasks such as information extraction and question answering.
2016
pdf
abs
The Query of Everything: Developing Open-Domain, Natural-Language Queries for BOLT Information Retrieval
Kira Griffitt
|
Stephanie Strassel
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
The DARPA BOLT Information Retrieval evaluations target open-domain natural-language queries over a large corpus of informal text in English, Chinese and Egyptian Arabic. We outline the goals of BOLT IR, comparing it with the prior GALE Distillation task. After discussing the properties of the BOLT IR corpus, we provide a detailed description of the query creation process, contrasting the summary query format presented to systems at run time with the full query format created by annotators. We describe the relevance criteria used to assess BOLT system responses, highlighting the evolution of the procedures used over the three evaluation phases. We provide a detailed review of the decision points model for relevance assessment introduced during Phase 2, and conclude with information about inter-assessor consistency achieved with the decision points assessment model.
2013
pdf
Abstract Meaning Representation for Sembanking
Laura Banarescu
|
Claire Bonial
|
Shu Cai
|
Madalina Georgescu
|
Kira Griffitt
|
Ulf Hermjakob
|
Kevin Knight
|
Philipp Koehn
|
Martha Palmer
|
Nathan Schneider
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse
2012
pdf
abs
Linguistic Resources for Entity Linking Evaluation: from Monolingual to Cross-lingual
Xuansong Li
|
Stephanie Strassel
|
Heng Ji
|
Kira Griffitt
|
Joe Ellis
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
To advance information extraction and question answering technologies toward a more realistic path, the U.S. NIST (National Institute of Standards and Technology) initiated the KBP (Knowledge Base Population) task as one of the TAC (Text Analysis Conference) evaluation tracks. It aims to encourage research in automatic information extraction of named entities from unstructured texts with the ultimate goal of integrating such information into a structured Knowledge Base. The KBP track consists of two types of evaluation: Named Entity Linking (NEL) and Slot Filling. This paper describes the linguistic resource creation efforts at the Linguistic Data Consortium (LDC) in support of Named Entity Linking evaluation of KBP, focusing on annotation methodologies, process, and features of corpora from 2009 to 2011, with a highlighted analysis of the cross-lingual NEL data. Progressing from monolingual to cross-lingual Entity Linking technologies, the 2011 cross-lingual NEL evaluation targeted multilingual capabilities. Annotation accuracy is presented in comparison with system performance, with promising results from cross-lingual entity linking systems.
pdf
abs
Annotation Trees: LDC’s customizable, extensible, scalable, annotation infrastructure
Jonathan Wright
|
Kira Griffitt
|
Joe Ellis
|
Stephanie Strassel
|
Brendan Callahan
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
In recent months, LDC has developed a web-based annotation infrastructure centered around a tree model of annotations and a Ruby on Rails application called the LDC User Interface (LUI). The effort aims to centralize all annotation into this single platform, which means annotation is always available remotely, with no more software required than a web browser. While the design is monolithic in the sense of handling any number of annotation projects, it is also scalable, as it is distributed over many physical and virtual machines. Furthermore, minimizing customization was a core design principle, and new functionality can be plugged in without writing a full application. The creation and customization of GUIs is itself done through the web interface, without writing code, with the aim of eventually allowing project managers to create a new task without developer intervention. Many of the desirable features follow from the model of annotations as trees, and the operationalization of annotation as tree modification.