First Insights into the Syntax of Slovene Student Writing: A Statistical Analysis of Šolar 3.0 vs. Učbeniki 1.0

Tina Munda, Špela Arhar Holdt


Abstract
This study investigates the syntactic features of Slovene student writing by comparing essays from the Šolar 3.0 corpus (ages 13–19; primary and secondary school levels) with textbook texts from the Učbeniki 1.0 corpus aligned to the same educational stages. We apply quantitative syntactic analysis at two complementary levels: clause-type frequency (coordination, parataxis, and four types of subordination) and tree-based syntactic complexity measures (number of clauses, clauses per T-unit, and maximum parse-tree depth). Results show that students heavily rely on coordination and specific subordinate clauses (especially object and adverbial), producing more clauses per sentence and per T-unit than textbooks. However, their sentences tend to exhibit flatter syntactic structures, with shallower embedding in primary school and only modest increases in tree depth by secondary school. These findings reveal a divergence between surface-level complexity and hierarchical depth, highlighting developmental trends and instructional targets in written syntactic maturity. We conclude by discussing implications for syntactic development and directions for future research.
Anthology ID:
2025.quasy-1.13
Volume:
Proceedings of the Third Workshop on Quantitative Syntax (QUASY, SyntaxFest 2025)
Month:
August
Year:
2025
Address:
Ljubljana, Slovenia
Editors:
Xinying Chen, Yaqin Wang
Venues:
Quasy | WS | SyntaxFest
SIG:
SIGPARSE
Publisher:
Association for Computational Linguistics
Note:
Pages:
105–114
Language:
URL:
https://preview.aclanthology.org/corrections-2025-08/2025.quasy-1.13/
DOI:
Bibkey:
Cite (ACL):
Tina Munda and Špela Arhar Holdt. 2025. First Insights into the Syntax of Slovene Student Writing: A Statistical Analysis of Šolar 3.0 vs. Učbeniki 1.0. In Proceedings of the Third Workshop on Quantitative Syntax (QUASY, SyntaxFest 2025), pages 105–114, Ljubljana, Slovenia. Association for Computational Linguistics.
Cite (Informal):
First Insights into the Syntax of Slovene Student Writing: A Statistical Analysis of Šolar 3.0 vs. Učbeniki 1.0 (Munda & Arhar Holdt, Quasy-SyntaxFest 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2025-08/2025.quasy-1.13.pdf