Heather Simpson


2010

pdf
An Evaluation of Technologies for Knowledge Base Population
Paul McNamee | Hoa Trang Dang | Heather Simpson | Patrick Schone | Stephanie M. Strassel
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Previous content extraction evaluations have neglected to address problems which complicate the incorporation of extracted information into an existing knowledge base. Previous question answering evaluations have likewise avoided tasks such as explicit disambiguation of target entities and handling a fixed set of questions about entities without previous determination of possible answers. In 2009 NIST conducted a Knowledge Base Population track at its Text Analysis Conference to unite the content extraction and question answering communities and jointly explore some of these issues. This exciting new evaluation attracted 13 teams from 6 countries that submitted results in two tasks, Entity Linking and Slot Filling. This paper explains the motivation and design of the tasks, describes the language resources that were developed for this evaluation, offers comparisons to previous community evaluations, and briefly summarizes the performance obtained by systems. We also identify relevant issues pertaining to target selection, challenging queries, and performance measures.

pdf
The DARPA Machine Reading Program - Encouraging Linguistic and Reasoning Research with a Series of Reading Tasks
Stephanie Strassel | Dan Adams | Henry Goldberg | Jonathan Herr | Ron Keesing | Daniel Oblinger | Heather Simpson | Robert Schrag | Jonathan Wright
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The goal of DARPA’s Machine Reading (MR) program is nothing less than making the world’s natural language corpora available for formal processing. Most text processing research has focused on locating mission-relevant text (information retrieval) and on techniques for enriching text by transforming it to other forms of text (translation, summarization) ― always for use by humans. In contrast, MR will make knowledge contained in text available in forms that machines can use for automated processing. This will be done with little human intervention. Machines will learn to read from a few examples and they will read to learn what they need in order to answer questions or perform some reasoning task. Three independent Reading Teams are building universal text engines which will capture knowledge from naturally occurring text and transform it into the formal representations used by Artificial Intelligence. An Evaluation Team is selecting and annotating text corpora with task domain concepts, creating model reasoning systems with which the reading systems will interact, and establishing question-answer sets and evaluation protocols to measure progress toward this goal. We describe development of the MR evaluation framework, including test protocols, linguistic resources and technical infrastructure.

pdf
Wikipedia and the Web of Confusable Entities: Experience from Entity Linking Query Creation for TAC 2009 Knowledge Base Population
Heather Simpson | Stephanie Strassel | Robert Parker | Paul McNamee
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The Text Analysis Conference (TAC) is a series of Natural Language Processing evaluation workshops organized by the National Institute of Standards and Technology. The Knowledge Base Population (KBP) track at TAC 2009, a hybrid descendant of the TREC Question Answering track and the Automated Content Extraction (ACE) evaluation program, is designed to support development of systems that are capable of automatically populating a knowledge base with information about entities mined from unstructured text. An important component of the KBP evaluation is the Entity Linking task, where systems must accurately associate text mentions of unknown Person (PER), Organization (ORG), and Geopolitical (GPE) names to entries in a knowledge base. Linguistic Data Consortium (LDC) at the University of Pennsylvania creates and distributes linguistic resources including data, annotations, system assessment, tools and specifications for the TAC KBP evaluations. This paper describes the 2009 resource creation efforts, with particular focus on the selection and development of named entity mentions for the Entity Linking task evaluation.

2009

pdf
Basic Language Resources for Diverse Asian Languages: A Streamlined Approach for Resource Creation
Heather Simpson | Kazuaki Maeda | Christopher Cieri
Proceedings of the 7th Workshop on Asian Language Resources (ALR7)