2023
pdf
abs
Semgrex and Ssurgeon, Searching and Manipulating Dependency Graphs
John Bauer
|
Chloé Kiddon
|
Eric Yeh
|
Alex Shan
|
Christopher D. Manning
Proceedings of the 21st International Workshop on Treebanks and Linguistic Theories (TLT, GURT/SyntaxFest 2023)
Searching dependency graphs and manipulating them can be a time consuming and challenging task to get right. We document Semgrex, a system for searching dependency graphs, and introduce Ssurgeon, a system for manipulating the output of Semgrex. The compact language used by these systems allows for easy command line or API processing of dependencies. Additionally, integration with publicly released toolkits in Java and Python allows for searching text relations and attributes over natural text.
2017
pdf
abs
Discourse-Wide Extraction of Assay Frames from the Biological Literature
Dayne Freitag
|
Paul Kalmar
|
Eric Yeh
Proceedings of the Biomedical NLP Workshop associated with RANLP 2017
We consider the problem of populating multi-part knowledge frames from textual information distributed over multiple sentences in a document. We present a corpus constructed by aligning papers from the cellular signaling literature to a collection of approximately 50,000 reference frames curated by hand as part of a decade-long project. We present and evaluate two approaches to the challenging problem of reconstructing these frames, which formalize biological assays described in the literature. One approach is based on classifying candidate records nominated by sentence-local entity co-occurrence. In the second approach, we introduce a novel virtual register machine traverses an article and generates frames, trained on our reference data. Our evaluations show that success in the task ultimately hinges on an integration of evidence spread across the discourse.
2016
pdf
abs
An Annotated Corpus and Method for Analysis of Ad-Hoc Structures Embedded in Text
Eric Yeh
|
John Niekrasz
|
Dayne Freitag
|
Richard Rohwer
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
We describe a method for identifying and performing functional analysis of structured regions that are embedded in natural language documents, such as tables or key-value lists. Such regions often encode information according to ad hoc schemas and avail themselves of visual cues in place of natural language grammar, presenting problems for standard information extraction algorithms. Unlike previous work in table extraction, which assumes a relatively noiseless two-dimensional layout, our aim is to accommodate a wide variety of naturally occurring structure types. Our approach has three main parts. First, we collect and annotate a a diverse sample of “naturally” occurring structures from several sources. Second, we use probabilistic text segmentation techniques, featurized by skip bigrams over spatial and token category cues, to automatically identify contiguous regions of structured text that share a common schema. Finally, we identify the records and fields within each structured region using a combination of distributional similarity and sequence alignment methods, guided by minimal supervision in the form of a single annotated record. We evaluate the last two components individually, and conclude with a discussion of further work.
2013
pdf
SRIUBC-Core: Multiword Soft Similarity Models for Textual Similarity
Eric Yeh
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity
2012
pdf
SRIUBC: Simple Similarity Features for Semantic Textual Similarity
Eric Yeh
|
Eneko Agirre
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)
2009
pdf
WikiWalk: Random walks on Wikipedia for Semantic Relatedness
Eric Yeh
|
Daniel Ramage
|
Christopher D. Manning
|
Eneko Agirre
|
Aitor Soroa
Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing (TextGraphs-4)
2007
pdf
Learning Alignments and Leveraging Natural Logic
Nathanael Chambers
|
Daniel Cer
|
Trond Grenager
|
David Hall
|
Chloe Kiddon
|
Bill MacCartney
|
Marie-Catherine de Marneffe
|
Daniel Ramage
|
Eric Yeh
|
Christopher D. Manning
Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing