STARK: A Toolkit for Dependency (Sub)Tree Extraction and Analysis

Luka Krsnik, Kaja Dobrovoljc


Abstract
We present STARK, a lightweight and flexible Python toolkit for extracting and analyzing syntactic (sub)trees from dependency-parsed corpora. By systematically slicing each sentence into interpretable syntactic units based on configurable parameters, STARK enables bottom-up, data-driven exploration of syntactic patterns at multiple levels of abstraction—from fully lexicalized constructions to general structural templates. It supports any CoNLL-U-formatted corpus and is available as a command-line tool, Python library, and interactive online demo, ensuring seamless integration into both exploratory and large-scale corpus workflows. We illustrate its functionality through case studies in noun phrase analysis, multiword expression identification, and syntactic variation across corpora, demonstrating its utility for a wide range of corpus-driven syntactic investigations.
Anthology ID:
2025.tlt-1.5
Volume:
Proceedings of the 23rd International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2025)
Month:
August
Year:
2025
Address:
Ljubljana, Slovenia
Editors:
Sarah Jablotschkin, Sandra Kübler, Heike Zinsmeister
Venues:
TLT | WS | SyntaxFest
SIG:
SIGPARSE
Publisher:
Association for Computational Linguistics
Note:
Pages:
44–51
Language:
URL:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.tlt-1.5/
DOI:
Bibkey:
Cite (ACL):
Luka Krsnik and Kaja Dobrovoljc. 2025. STARK: A Toolkit for Dependency (Sub)Tree Extraction and Analysis. In Proceedings of the 23rd International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2025), pages 44–51, Ljubljana, Slovenia. Association for Computational Linguistics.
Cite (Informal):
STARK: A Toolkit for Dependency (Sub)Tree Extraction and Analysis (Krsnik & Dobrovoljc, TLT-SyntaxFest 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.tlt-1.5.pdf