2025
pdf
bib
abs
«Are you Afraid of Ghosts?» A Proposal for Busting Predicate Ellipsis in Universal Dependencies
Claudia Corbetta
|
Federica Iurescia
|
Marco Carlo Passarotti
Proceedings of the 23rd International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2025)
This paper addresses the representation of ellipsis in dependency syntax, proposing both a theoretical and a practical workflow for its analysis and annotation in treebanks, following the state-of-the-art Universal Dependencies framework. We discuss the challenges of annotating ellipsis, with a focus on predicate ellipsis and its representation in dependency treebanks, and emphasize the importance of accounting for such phenomena for syntactic analysis and machine learning applications. We present a case study based on the Italian-Old treebank, demonstrating the applicability of the proposed workflows and invite the community to participate in this initiative with their own languages.
2024
pdf
bib
abs
Join Together? Combining Data to Parse Italian Texts
Claudia Corbetta
|
Giovanni Moretti
|
Marco Passarotti
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
In this paper, we create and evaluate non-combined and combined models using Old and Contemporary Italian data to determine whether increasing the size of the training data with a combined model could improve parsing accuracy to facilitate manual annotation. We find that, despite the increased size of the training data, in-domain parsing performs better. Additionally, we discover that models trained on Old Italian data perform better on Contemporary Italian data than the reverse. We attempt to explain this result in terms of syntactic complexity, finding that Old Italian text exhibits higher sentence length and non-projectivity rate.
pdf
bib
abs
The Rise and Fall of Dependency Parsing in Dante Alighieri’s Divine Comedy
Claudia Corbetta
|
Marco Passarotti
|
Giovanni Moretti
Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024
In this paper, we conduct parsing experiments on Dante Alighieri’s Divine Comedy, an Old Italian poem composed between 1306-1321 and organized into three Cantiche —Inferno, Purgatorio, and Paradiso. We perform parsing on subsets of the poem using both a Modern Italian training set and sections of the Divine Comedy itself to evaluate under which scenarios parsers achieve higher scores. We find that employing in-domain training data supports better results, leading to an increase of approximately +17% in Unlabeled Attachment Score (UAS) and +25-30% in Labeled Attachment Score (LAS). Subsequently, we provide brief commentary on the differences in scores achieved among subsections of Cantiche, and we conduct experimental parsing on a text from the same period and style as the Divine Comedy.
2023
pdf
bib
Highway to Hell. Towards a Universal Dependencies Treebank for Dante Alighieri’s Comedy
Claudia Corbetta
|
Marco Passarotti
|
Flavio Massimiliano Cecchini
|
Giovanni Moretti
Proceedings of the 9th Italian Conference on Computational Linguistics (CLiC-it 2023)