Emiliana Pulido
2026
The Spanish Learner and Heritage Speaker Dependency Treebank
Valeria Pagliai | Sergio José Salazar Rodó | Emiliana Pulido | Andres Gutierrez-Quintero | Zoey Liu
Proceedings of the Society for Computation in Linguistics 2026
Valeria Pagliai | Sergio José Salazar Rodó | Emiliana Pulido | Andres Gutierrez-Quintero | Zoey Liu
Proceedings of the Society for Computation in Linguistics 2026
We present a manually curated L2-Heritage Speaker Spanish dataset (N = 49,247) following the Universal Dependencies framework, including lemmatizations, part-of-speech tags, syntactic dependencies, and instances of pro-drop and ungrammatical structures. In addition to this, for dependency parsing we examined different data partitioning strategies and data representations, as well as different training configurations using our data and the AnCora treebank. Overall, the results yield reasonable LAS scores and comparable performance between AnCora and our dataset.
2025
I Speak for the Árboles: Developing a Dependency Treebank for Spanish L2 and Heritage Speakers
Emiliana Pulido | Robert Pugh | Zoey Liu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Emiliana Pulido | Robert Pugh | Zoey Liu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
We introduce the first dependency treebank containing Universal Dependencies (UD) annotations for Spanish learner writing from the UC Davis COWSL2H corpus. Our annotations include lemmatization, POS tagging, and syntactic dependencies. We adapt the existing UD framework for Spanish L1 to account forlearner-specific features such as code-switching and non-canonical syntax. A suite of parsing evaluation experiments shows that parsers trained on learner data together with moderate sizes of Spanish L1 data can yield reasonable performance. Our annotations are openly accessible to motivate future development of learner-oriented language technologies.