I Speak for the Árboles: Developing a Dependency Treebank for Spanish L2 and Heritage Speakers

Emiliana Pulido, Robert Pugh, Zoey Liu


Abstract
We introduce the first dependency treebank containing Universal Dependencies (UD) annotations for Spanish learner writing from the UC Davis COWSL2H corpus. Our annotations include lemmatization, POS tagging, and syntactic dependencies. We adapt the existing UD framework for Spanish L1 to account forlearner-specific features such as code-switching and non-canonical syntax. A suite of parsing evaluation experiments shows that parsers trained on learner data together with moderate sizes of Spanish L1 data can yield reasonable performance. Our annotations are openly accessible to motivate future development of learner-oriented language technologies.
Anthology ID:
2025.acl-srw.56
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Jin Zhao, Mingyang Wang, Zhu Liu
Venues:
ACL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
814–822
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-srw.56/
DOI:
Bibkey:
Cite (ACL):
Emiliana Pulido, Robert Pugh, and Zoey Liu. 2025. I Speak for the Árboles: Developing a Dependency Treebank for Spanish L2 and Heritage Speakers. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), pages 814–822, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
I Speak for the Árboles: Developing a Dependency Treebank for Spanish L2 and Heritage Speakers (Pulido et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-srw.56.pdf