pdf
bib
Proceedings of the Eighth International Conference on Dependency Linguistics (Depling, SyntaxFest 2025)
Eva Hajičová
|
Sylvain Kahane
pdf
bib
abs
A Typology of Non-Projective Patterns in Unas’s and Teti’s Pyramid Texts
Roberto A. Diaz Hernandez
The aim of this paper is to study the use of non-projective structures in Unas’s and Teti’s Pyramid Texts (ca. 2321–2279 BC) annotated in the Egyptian-UJaen treebank. It offers the first typology of non-projective patterns in Old Egyptian, and it discusses the causes for non-projectivity in the Old Egyptian language of Unas’s and Teti’s Pyramid Texts to conclude that non-projectivity is an exceptional phenomenon in these texts.
pdf
bib
abs
Tracing Syntactic Complexity: Exploring the Evolution of Average Dependency Length Across Three Centuries of Scientific English
Marie-Pauline Krielke
|
Diego Alves
|
Luigi Talamo
We present a diachronic analysis of syntactic change in a corpus covering over 300 years (1665–1996) of scientific English, annotated with Universal Dependencies (UD) and Dependency Length (DL). We trace the development of average Dependency Length (aDL) as a measure of syntactic complexity in scientific English between 1665 and 1996. We describe the construction of the corpus and report on the evaluation of the UD annotation. We find that aDL initially decreases toward the 19th century, but then increases significantly in the 20th century. We show that this highly aggregate measure of aDL masks the underlying mechanisms driving changes in syntactic complexity. A more fine-grained analysis of the dependency relations involved in these changes reveals that the increasing use of (multi-word) compounds is a dominant source of long, leftward-expanded noun phrases. This leads to an expansion of syntactic dependencies both within and beyond the noun phrase. The results offer a new perspective on syntactic complexity, shifting the focus from the sentence level to the phrasal level.
pdf
bib
abs
Modeling Syntactic Dependencies in Southern Dutch Dialects
Loic De Langhe
|
Jasper Degraeuwe
|
Melissa Farasyn
|
Veronique Hoste
Dependency parsing of non-normative language varieties remains a challenge for modern NLP. While contemporary parsers excel at standardized languages, dialectal variation – especially in function words, conjunctives, and verb clustering – introduces syntactic ambiguity that disrupts traditional parsing approaches. In this paper, we conduct a quantitative evaluation of syntactic dependencies in Southern Dutch dialects, leveraging a standardized dialect corpus to isolate syntactic effects from lexical variation. Using a neural biaffine dependency parser with various mono- and multilingual transformer-based encoders, we benchmark parsing performance on standard Dutch, dialectal data, and mixed training sets. Our results demonstrate that incorporating dialect-specific data significantly enhances parsing accuracy, yet certain syntactic structures remain difficult to resolve, even with dedicated adaptation. These findings highlight the need for more nuanced parsing strategies and improved syntactic modeling for non-normative language varieties.
pdf
bib
abs
Assessing the Agreement Competence of Large Language Models
Alba Táboas García
|
Leo Wanner
While the competence of LLMs to cope with agreement constraints has been widely tested in English, only a very limited number of works deals with morphologically rich(er) languages. In this work, we experiment with 25 mono- and multilingual LLMs, applying them to a collection of more than 5,000 test examples that cover the main agreement phenomena in three Romance languages (Italian, Portuguese, and Spanish) and one Slavic Language (Russian). We identify which of the agreement phenomena are most difficult for which models and challenge some common assumptions of what makes a good model. The test suites into which the test examples are organized are openly available and can be easily adapted to other agreement phenomena and other languages for further research.
pdf
bib
abs
Introducing KIParla Forest: seeds for a UD annotation of interactional syntax
Ludovica Pannitto
|
Eleonora Zucchini
|
Silvia Ballarè
|
Cristina Bosco
|
Caterina Mauri
|
Manuela Sanguinetti
The present project endeavors to enrich the linguistic resources available for Italian by introducing KIParla Forest, a treebank for the KIParla corpus - an existing and well-known resource for spoken Italian. This article contextualizes the project, describes the treebank creation process and design choices, and highlights future plans for next improvements.
pdf
bib
abs
Head-initial and head-Final coordinate structures in two annotation schemes of dependency grammar
Timothy John Osborne
|
Chenchen Song
The Universal Dependencies (UD) and Surface-Syntactic Universal Dependencies (SUD) annotation schemes view coordinate structures as head-initial. This contribution argues that a more flexible approach to coordinate structures is linguistically motivated, one that sees coordinate structures as head-initial in greater head-initial structures and as head-final in greater head-final structures. Support for this flexible approach comes from two areas: dependency distance and a nearness effect. In addition, two arguments that have been produced supporting the strictly head-initial approach are examined and refuted.
pdf
bib
abs
Genre Variation in Dependency Types: A Two-Level Genre Analysis Using the Czech National Corpus
Xinying Chen
|
Miroslav Kubát
This paper examines how dependency type distributions vary across genres in the Czech National Corpus (SYN2020). Using a two-level genre classification, broad categories and fine-grained subgenres, we identify genre-sensitive syntactic patterns through relative frequency analysis. The results show that some dependency types (e.g. Atr ‘attribute’) vary consistently across genres, while others (e.g. ExD ‘part of discourse ellipsis’) show sensitivity only at the subgenre level. Our dependency-based approach extends common multidimensional analyses based on lexical-grammatical co-occurrences, directly capturing syntactic evidence and improving interpretability. Our findings also highlight the importance of fine-grained genre distinctions in revealing syntactic variation.
pdf
bib
abs
A morpheme-based treebank for Gbaya, an Ubanguian language of Central Africa
Paulette Roulon-Doko
|
Sylvain Kahane
|
Bruno Guillaume
In this paper, we present the first treebank for Gbaya, a language from the under-resourced Niger-Congo family. The language has a rich system of tonal morphemes and virtually no affixes. The dependency analysis is based on a morpheme-based tokenisation and the treebank is also distributed in word-based Universal Dependencies version. Several constructions are discussed in the paper: genitive construction, clause coordination, sentence particles, adverbial and relative clauses, serial verb constructions, reported speech, topicalization, and focalization.
pdf
bib
abs
Dative alternations in less-researched syntactic patterns of standard Croatian
Matea Andrea Birtić
|
Siniša Runjaić
|
Robert Sviben
Dative alternation in double object constructions is a frequently researched syntactic phenomenon, having been investigated across world languages. Consequently, even relatively smaller and under-resourced languages like Croatian have seen influential studies on the topic. Recent syntactic and semantic analyses of verbs in standard Croatian have identified less-explored instances of dative alternation. This contribution aims to describe the alternation between dative case and prepositional phrase for the non-agentive and intransitive uses of the verb služiti (‘to serve’), as well as the dative alternation for the agentive and transitive uses of the verb izbjeći (‘to avoid’).
pdf
bib
abs
Distance and Projectivity as Predictors of Sentence Acceptability in Free Word Order Languages
Kirill Chuprinko
|
Artem Novozhilov
|
Arthur Stepanov
This study investigates how two core metrics rooted in Dependency Grammar, Minimal Dependency Distance (MDD) and projectivity, predict sentence acceptability in Russian and Serbo-Croatian. Using exhaustive word order permutations in controlled five-word sentences, we model how these metrics relate to acceptability judgments in two psycholinguistic experiments. While MDD has been widely studied as a processing constraint, projectivity violations have received less attention in acceptability modeling. We show that both significantly affect judgments, with projectivity playing a surprisingly strong role. In addition, Serbo-Croatian’s rigid clitic placement provides a natural test case for disentangling grammatical from processing constraints. Our findings offer a computationally precise, dependency-based model of acceptability that advances cognitively grounded language modeling for free word order languages.
pdf
bib
abs
UD Annotation of Experience Clauses in Tigrinya
Michael Gasser
|
Nazareth Amlesom Kifle
We are developing a treebank for Tigrinya within the Universal Dependency (UD) framework. UD proposes a set of universal grammatical relations to capture dependency relations between words in any language. However, for some classes of verbs it is not a straightforward matter to know what grammatical relations the verbs are categorized for. In this paper we discuss the decisions we have had to make for the annotation of arguments of experience verbs in the Semitic language Tigrinya, which exhibit a number of unusual morphosyntactic properties. We describe a classification of experience verb roots in the language, based on the various ways in which the core experiencer and stimulus arguments are realized syntactically and morphologically and on which valence-changing operations the roots permit. We supplement our analysis with data from a morphological analysis of a Tigrinya corpus.
pdf
bib
abs
A corpus-driven description of OV order in Archaic Chinese
Qishen Wu
|
Santiago Herrera
|
Pierre Magistry
|
Sylvain Kahane
This paper presents a quantitative study of Object‐Verb (OV) order in Archaic Chinese based on a Universal Dependencies (UD) treebanks. Treating word order as a binary choice (OV vs VO), we train a sparse logistic‐regression classifier that selects the most salient syntactic features needed for an accurate prediction to investigate the specific syntactic contexts allowing OV word order and to identify to what extent do these factors favour this order. The ranked features are understood as interpretable rules, and their coverage and precision as quantitative properties of each rule. The approach confirms earlier qualitative findings (e.g. pronoun object fronting and negation favour OV) and uncovers new contrasts in word order between different reflexive pronouns. It also identifies annotation errors that we corrected in the final analysis, illustrating how the quantitative models, combined with fine-grained corpus analysis, can improve treebank quality. Our study demonstrates that lightweight machine‐learning techniques applied to an existing syntactic resource can reveal fine‐grained patterns in historical word order and this can be reapplied to other languages.
pdf
bib
abs
Periphrastic Verb Forms in Universal Dependencies
Lenka Krippnerová
|
Daniel Zeman
We propose a generalization of the morphological annotation in Universal Dependencies (UD) to phrases spanning multiple words, possibly discontinuous. Our focus area is that of periphrastic tenses, voices and other forms, typically consisting of a non-finite content verb combined with one or more auxiliaries; however, the same approach can be applied to other morphosyntactic constructions. We present a software tool that can detect periphrastic verb forms, extract the relevant morphological features from member words and combine them into new, phrase-level annotation. The tool currently detects periphrastic verb forms in 15 Slavic languages that are represented in UD and it is easily adaptable to other constructions and languages. Both the tool and the processed Slavic data are freely available.
pdf
bib
abs
Word Order Variation in Spoken and Written Corpora: A Cross-Linguistic Study of SVO and Alternative Orders
Nives Hüll
|
Kaja Dobrovoljc
This study investigates word order variation in spoken and written corpora across five Indo-European languages: English, French, Norwegian (Nynorsk), Slovenian, and Spanish. Using Universal Dependencies treebanks, we analyze the distribution of six canonical word orders (SVO, SOV, VSO, VOS, OSV, OVS). Our results reveal that spoken language consistently exhibits greater word order flexibility than written language. This increased flexibility manifests as a decrease in the dominant SVO pattern and a rise in alternative orders, though the extent of this variation differs across languages. Morphologically rich languages such as Slovenian and Spanish show the most pronounced shifts, while English remains syntactically rigid across modalities. These findings support the claim that modality significantly affects syntactic realizations and highlight the need for typological studies to account for spoken data.