John Bauer

2023

pdf abs
Semgrex and Ssurgeon, Searching and Manipulating Dependency Graphs
John Bauer | Chloé Kiddon | Eric Yeh | Alex Shan | Christopher D. Manning
Proceedings of the 21st International Workshop on Treebanks and Linguistic Theories (TLT, GURT/SyntaxFest 2023)

Searching dependency graphs and manipulating them can be a time consuming and challenging task to get right. We document Semgrex, a system for searching dependency graphs, and introduce Ssurgeon, a system for manipulating the output of Semgrex. The compact language used by these systems allows for easy command line or API processing of dependencies. Additionally, integration with publicly released toolkits in Java and Python allows for searching text relations and attributes over natural text.

2014

pdf
The Stanford CoreNLP Natural Language Processing Toolkit
Christopher Manning | Mihai Surdeanu | John Bauer | Jenny Finkel | Steven Bethard | David McClosky
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We present a gold standard annotation of syntactic dependencies in the English Web Treebank corpus using the Stanford Dependencies formalism. This resource addresses the lack of a gold standard dependency treebank for English, as well as the limited availability of gold standard syntactic annotations for English informal text genres. We also present experiments on the use of this resource, both for training dependency parsers and for evaluating the quality of different versions of the Stanford Parser, which includes a converter tool to produce dependency annotation from constituency trees. We show that training a dependency parser on a mix of newswire and web data leads to better performance on that type of data without hurting performance on newswire text, and therefore gold standard annotations for non-canonical text can be a valuable resource for parsing. Furthermore, the systematic annotation effort has informed both the SD formalism and its implementation in the Stanford Parser’s dependency converter. In response to the challenges encountered by annotators in the EWT corpus, the formalism has been revised and extended, and the converter has been improved.

John Bauer

2023

2014

2013

2011

Co-authors

Venues