Syntactic units and their length distributions: A case study in Czech

Michaela Nogolová; Michaela Koščová; Ján Mačutek; Radek Čech

Syntactic units and their length distributions: A case study in Czech

Michaela Nogolová, Michaela Koščová, Jan Macutek, Radek Cech

Abstract

This study investigates the length distributions of syntactic units in Czech across multiple hierarchical levels: sentences, independent clauses, clauses, phrases, subphrases, and chunks. Using a diverse dataset – including Universal Dependency treebanks, presidential speeches, the Czech Bible, and random sample from corpora of modern Czech – the analysis examines whether lengths of these syntactic units follow consistent distributional patterns. Length is defined as the number of immediate subunits, and the distributions were modeled using the hyper-Poisson distribution. The results demonstrate that the hyper-Poisson model fits well distributions of length of all abovementioned syntactic units, pointing to a common principle underlying the organization of syntactic structure in Czech.

Anthology ID:: 2025.quasy-1.14
Volume:: Proceedings of the Third Workshop on Quantitative Syntax (QUASY, SyntaxFest 2025)
Month:: August
Year:: 2025
Address:: Ljubljana, Slovenia
Editors:: Xinying Chen, Yaqin Wang
Venues:: Quasy | WS | SyntaxFest
SIG:: SIGPARSE
Publisher:: Association for Computational Linguistics
Note:
Pages:: 115–123
Language:
URL:: https://preview.aclanthology.org/mtsummit-25-ingestion/2025.quasy-1.14/
DOI:
Bibkey:
Cite (ACL):: Michaela Nogolová, Michaela Koščová, Jan Macutek, and Radek Cech. 2025. Syntactic units and their length distributions: A case study in Czech. In Proceedings of the Third Workshop on Quantitative Syntax (QUASY, SyntaxFest 2025), pages 115–123, Ljubljana, Slovenia. Association for Computational Linguistics.
Cite (Informal):: Syntactic units and their length distributions: A case study in Czech (Nogolová et al., Quasy-SyntaxFest 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/mtsummit-25-ingestion/2025.quasy-1.14.pdf

PDF Cite Search Fix data