2014
pdf
abs
Corpus and Evaluation of Handwriting Recognition of Historical Genealogical Records
Patrick Schone
|
Heath Nielson
|
Mark Ward
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Over the last few decades, significant strides have been made in handwriting recognition (HR), which is the automatic transcription of handwritten documents. HR often focuses on modern handwritten material, but in the electronic age, the volume of handwritten material is rapidly declining. However, we believe HR is on the verge of having major application to historical record collections. In recent years, archives and genealogical organizations have conducted huge campaigns to transcribe valuable historical record content with such transcription being largely done through human-intensive labor. HR has the potential of revolutionizing these transcription endeavors. To test the hypothesis that this technology is close to applicability, and to provide a testbed for reducing any accuracy gaps, we have developed an evaluation paradigm for historical record handwriting recognition. We created a huge test corpus consisting of four historical data collections of four differing genres and three languages. In this paper, we provide the details of these extensive resources which we intend to release to the research community for further study. Since several research organizations have already participated in this evaluation, we also show initial results and comparisons to human levels of performance.
2010
pdf
abs
An Evaluation of Technologies for Knowledge Base Population
Paul McNamee
|
Hoa Trang Dang
|
Heather Simpson
|
Patrick Schone
|
Stephanie M. Strassel
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Previous content extraction evaluations have neglected to address problems which complicate the incorporation of extracted information into an existing knowledge base. Previous question answering evaluations have likewise avoided tasks such as explicit disambiguation of target entities and handling a fixed set of questions about entities without previous determination of possible answers. In 2009 NIST conducted a Knowledge Base Population track at its Text Analysis Conference to unite the content extraction and question answering communities and jointly explore some of these issues. This exciting new evaluation attracted 13 teams from 6 countries that submitted results in two tasks, Entity Linking and Slot Filling. This paper explains the motivation and design of the tasks, describes the language resources that were developed for this evaluation, offers comparisons to previous community evaluations, and briefly summarizes the performance obtained by systems. We also identify relevant issues pertaining to target selection, challenging queries, and performance measures.
2008
pdf
Learning Named Entity Hyponyms for Question Answering
Paul McNamee
|
Rion Snow
|
Patrick Schone
|
James Mayfield
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II
pdf
bib
Mining Wiki Resources for Multilingual Named Entity Recognition
Alexander E. Richman
|
Patrick Schone
Proceedings of ACL-08: HLT
2001
pdf
Knowledge-Free Induction of Inflectional Morphologies
Patrick Schone
|
Daniel Jurafsky
Second Meeting of the North American Chapter of the Association for Computational Linguistics
pdf
Is Knowledge-Free Induction of Multiword Unit Dictionary Headwords a Solved Problem?
Patrick Schone
|
Daniel Jurafsky
Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing
2000
pdf
Knowledge-Free Induction of Morphology Using Latent Semantic Analysis
Patrick Schone
|
Daniel Jurafsky
Fourth Conference on Computational Natural Language Learning and the Second Learning Language in Logic Workshop