The Spanish Learner and Heritage Speaker Dependency Treebank
Valeria Pagliai, Sergio José Salazar Rodó, Emiliana Pulido, Andres Gutierrez-Quintero, Zoey Liu
Abstract
We present a manually curated L2-Heritage Speaker Spanish dataset (N = 49,247) following the Universal Dependencies framework, including lemmatizations, part-of-speech tags, syntactic dependencies, and instances of pro-drop and ungrammatical structures. In addition to this, for dependency parsing we examined different data partitioning strategies and data representations, as well as different training configurations using our data and the AnCora treebank. Overall, the results yield reasonable LAS scores and comparable performance between AnCora and our dataset.- Anthology ID:
- 2026.scil-main.12
- Volume:
- Proceedings of the Society for Computation in Linguistics 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, CA
- Editors:
- Rob Voigt, Alex Warstadt, Naomi Feldman, Tal Linzen
- Venues:
- SCiL | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 127–128
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.scil-main.12/
- DOI:
- Cite (ACL):
- Valeria Pagliai, Sergio José Salazar Rodó, Emiliana Pulido, Andres Gutierrez-Quintero, and Zoey Liu. 2026. The Spanish Learner and Heritage Speaker Dependency Treebank. In Proceedings of the Society for Computation in Linguistics 2026, pages 127–128, San Diego, CA. Association for Computational Linguistics.
- Cite (Informal):
- The Spanish Learner and Heritage Speaker Dependency Treebank (Pagliai et al., SCiL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.scil-main.12.pdf