On the Flatness, Non-linearity, and Branching Direction of Natural Language and Random Constituency Trees: Analyzing Structural Variation within and across Languages

Taiga Ishii, Yusuke Miyao


Abstract
Natural languages exhibit remarkable diversity in their syntactic structures. Previous research has investigated the cross-lingual differences in local structural features such as word order or dependency relations. However, considering structural variation within individual language, it remains unclear how such features influence the variation in the overall constituency tree structure and hence the structural variation across languages. To this end, we focus on the shape of constituency trees, analyzing the cross-lingual overlap in the distributions of flatness, non-linearity, and branching direction. While acknowledging that the findings may be influenced by the potential annotation idiosyncrasies across treebanks, the experiments quantitatively suggest that flatness and branching direction vary significantly across languages. As for non-linearity, the cross-lingual difference was relatively small, and the distributions tend to skew towards linear structures. Furthermore, comparison with randomly generated trees suggests that while phrase category and frequency information is crucial for reproducing the branching direction found in natural languages, non-linearity can be replicated reasonably well even without such information.
Anthology ID:
2025.quasy-1.12
Volume:
Proceedings of the Third Workshop on Quantitative Syntax (QUASY, SyntaxFest 2025)
Month:
August
Year:
2025
Address:
Ljubljana, Slovenia
Editors:
Xinying Chen, Yaqin Wang
Venues:
Quasy | WS | SyntaxFest
SIG:
SIGPARSE
Publisher:
Association for Computational Linguistics
Note:
Pages:
90–104
Language:
URL:
https://preview.aclanthology.org/corrections-2025-08/2025.quasy-1.12/
DOI:
Bibkey:
Cite (ACL):
Taiga Ishii and Yusuke Miyao. 2025. On the Flatness, Non-linearity, and Branching Direction of Natural Language and Random Constituency Trees: Analyzing Structural Variation within and across Languages. In Proceedings of the Third Workshop on Quantitative Syntax (QUASY, SyntaxFest 2025), pages 90–104, Ljubljana, Slovenia. Association for Computational Linguistics.
Cite (Informal):
On the Flatness, Non-linearity, and Branching Direction of Natural Language and Random Constituency Trees: Analyzing Structural Variation within and across Languages (Ishii & Miyao, Quasy-SyntaxFest 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2025-08/2025.quasy-1.12.pdf