Niklas Deworetzki
2026
Syntactic Sugar for Syntactic Queries: Sequential Representations for Dependency Queries
Niklas Deworetzki | Arianna Masciolini
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Niklas Deworetzki | Arianna Masciolini
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Syntactic query languages such as Grew and dep_search allow looking for grammatical patterns in linguistically annotated corpora. However, these languages are often unsupported by large-scale corpus management tools, where queries are of an essentially sequential nature. In this paper, we present CQP/Tree, a tool to convert syntactic queries into CQL, the Corpus Query Language used in Corpus Workbench, SketchEngine, Korp and several other such systems. In this framework, syntactic queries act as _syntactic sugar_: they allow expressing complex CQL queries in a more readable and concise fashion, thus bridging the gap between expressive linguistic search and large-scale corpora. CQP/Tree is available as a web and command-line tool, as well as an open source Python library.
2025
Graph Databases for Fast Queries in UD Treebanks
Niklas Deworetzki | Peter Ljunglöf
Proceedings of the 23rd International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2025)
Niklas Deworetzki | Peter Ljunglöf
Proceedings of the 23rd International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2025)
We investigate if labeled property graphs, and graph databases, can be an useful and efficient way of encoding UD treebanks, to facilitate searching for complex syntactic phenomena. We give two alternative encodings of UD treebanks into the off-the-shelf graph database Neo4j, and show how to translate syntactic queries into the graph query language Cypher. Our evaluation shows that graph databases can improve query times by several orders of magnitude, compared to existing approaches.