A corpus-driven description of OV order in Archaic Chinese

Qishen Wu, Santiago Herrera, Pierre Magistry, Sylvain Kahane


Abstract
This paper presents a quantitative study of Object‐Verb (OV) order in Archaic Chinese based on a Universal Dependencies (UD) treebanks. Treating word order as a binary choice (OV vs VO), we train a sparse logistic‐regression classifier that selects the most salient syntactic features needed for an accurate prediction to investigate the specific syntactic contexts allowing OV word order and to identify to what extent do these factors favour this order. The ranked features are understood as interpretable rules, and their coverage and precision as quantitative properties of each rule. The approach confirms earlier qualitative findings (e.g. pronoun object fronting and negation favour OV) and uncovers new contrasts in word order between different reflexive pronouns. It also identifies annotation errors that we corrected in the final analysis, illustrating how the quantitative models, combined with fine-grained corpus analysis, can improve treebank quality. Our study demonstrates that lightweight machine‐learning techniques applied to an existing syntactic resource can reveal fine‐grained patterns in historical word order and this can be reapplied to other languages.
Anthology ID:
2025.depling-1.13
Volume:
Proceedings of the Eighth International Conference on Dependency Linguistics (Depling, SyntaxFest 2025)
Month:
August
Year:
2025
Address:
Ljubljana, Slovenia
Editors:
Eva Hajičová, Sylvain Kahane
Venues:
DepLing | WS | SyntaxFest
SIG:
SIGPARSE
Publisher:
Association for Computational Linguistics
Note:
Pages:
130–139
Language:
URL:
https://preview.aclanthology.org/corrections-2025-08/2025.depling-1.13/
DOI:
Bibkey:
Cite (ACL):
Qishen Wu, Santiago Herrera, Pierre Magistry, and Sylvain Kahane. 2025. A corpus-driven description of OV order in Archaic Chinese. In Proceedings of the Eighth International Conference on Dependency Linguistics (Depling, SyntaxFest 2025), pages 130–139, Ljubljana, Slovenia. Association for Computational Linguistics.
Cite (Informal):
A corpus-driven description of OV order in Archaic Chinese (Wu et al., DepLing-SyntaxFest 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2025-08/2025.depling-1.13.pdf