Graph Databases for Fast Queries in UD Treebanks

Niklas Deworetzki, Peter Ljunglöf


Abstract
We investigate if labeled property graphs, and graph databases, can be an useful and efficient way of encoding UD treebanks, to facilitate searching for complex syntactic phenomena. We give two alternative encodings of UD treebanks into the off-the-shelf graph database Neo4j, and show how to translate syntactic queries into the graph query language Cypher. Our evaluation shows that graph databases can improve query times by several orders of magnitude, compared to existing approaches.
Anthology ID:
2025.tlt-1.4
Volume:
Proceedings of the 23rd International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2025)
Month:
August
Year:
2025
Address:
Ljubljana, Slovenia
Editors:
Sarah Jablotschkin, Sandra Kübler, Heike Zinsmeister
Venues:
TLT | WS | SyntaxFest
SIG:
SIGPARSE
Publisher:
Association for Computational Linguistics
Note:
Pages:
32–43
Language:
URL:
https://preview.aclanthology.org/corrections-2025-08/2025.tlt-1.4/
DOI:
Bibkey:
Cite (ACL):
Niklas Deworetzki and Peter Ljunglöf. 2025. Graph Databases for Fast Queries in UD Treebanks. In Proceedings of the 23rd International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2025), pages 32–43, Ljubljana, Slovenia. Association for Computational Linguistics.
Cite (Informal):
Graph Databases for Fast Queries in UD Treebanks (Deworetzki & Ljunglöf, TLT-SyntaxFest 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2025-08/2025.tlt-1.4.pdf