ComparaTree: A Multi-Level Comparative Treebank Analysis Tool

Luka Terčon, Kaja Dobrovoljc


Abstract
ComparaTree is a tool for comparative treebank analysis that combines various methods of quantitative linguistic analysis to provide a general overview of the differences and similarities between two treebanks. The comparison tool covers a range of subfields of linguistic analysis, providing a summary of the differences and similarities in terms of the lexical diversity, n-gram diversity, part-of-speech and dependency relation proportions, syntactic complexity, and syntactic diversity. We explain the various quantitative analyses performed on every level along with the generation of graphical visualizations, which add value by enabling user-friendly comparisons at a glance. We exemplify the comparison process by presenting the results produced by the tool when comparing two treebanks from the Universal Dependencies collection.
Anthology ID:
2025.tlt-1.15
Volume:
Proceedings of the 23rd International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2025)
Month:
August
Year:
2025
Address:
Ljubljana, Slovenia
Editors:
Sarah Jablotschkin, Sandra Kübler, Heike Zinsmeister
Venues:
TLT | WS | SyntaxFest
SIG:
SIGPARSE
Publisher:
Association for Computational Linguistics
Note:
Pages:
129–139
Language:
URL:
https://preview.aclanthology.org/corrections-2025-08/2025.tlt-1.15/
DOI:
Bibkey:
Cite (ACL):
Luka Terčon and Kaja Dobrovoljc. 2025. ComparaTree: A Multi-Level Comparative Treebank Analysis Tool. In Proceedings of the 23rd International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2025), pages 129–139, Ljubljana, Slovenia. Association for Computational Linguistics.
Cite (Informal):
ComparaTree: A Multi-Level Comparative Treebank Analysis Tool (Terčon & Dobrovoljc, TLT-SyntaxFest 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2025-08/2025.tlt-1.15.pdf