Paul Kalmar
2022
Valet: Rule-Based Information Extraction for Rapid Deployment
Dayne Freitag
|
John Cadigan
|
Robert Sasseen
|
Paul Kalmar
Proceedings of the Thirteenth Language Resources and Evaluation Conference
We present VALET, a framework for rule-based information extraction written in Python. VALET departs from legacy approaches predicated on cascading finite-state transducers, instead offering direct support for mixing heterogeneous information–lexical, orthographic, syntactic, corpus-analytic–in a succinct syntax that supports context-free idioms. We show how a handful of rules suffices to implement sophisticated matching, and describe a user interface that facilitates exploration for development and maintenance of rule sets. Arguing that rule-based information extraction is an important methodology early in the development cycle, we describe an experiment in which a VALET model is used to annotate examples for a machine learning extraction model. While learning to emulate the extraction rules, the resulting model generalizes them, recognizing valid extraction targets the rules failed to detect.
2017
Discourse-Wide Extraction of Assay Frames from the Biological Literature
Dayne Freitag
|
Paul Kalmar
|
Eric Yeh
Proceedings of the Biomedical NLP Workshop associated with RANLP 2017
We consider the problem of populating multi-part knowledge frames from textual information distributed over multiple sentences in a document. We present a corpus constructed by aligning papers from the cellular signaling literature to a collection of approximately 50,000 reference frames curated by hand as part of a decade-long project. We present and evaluate two approaches to the challenging problem of reconstructing these frames, which formalize biological assays described in the literature. One approach is based on classifying candidate records nominated by sentence-local entity co-occurrence. In the second approach, we introduce a novel virtual register machine traverses an article and generates frames, trained on our reference data. Our evaluations show that success in the task ultimately hinges on an integration of evidence spread across the discourse.
2007
FICO: Web Person Disambiguation Via Weighted Similarity of Entity Contexts
Paul Kalmar
|
Matthias Blume
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)
Search