Revisiting the Effects of Leakage on Dependency Parsing

Nathaniel Krasner, Miriam Wanner, Antonios Anastasopoulos


Abstract
Recent work by Søgaard (2020) showed that, treebank size aside, overlap between training and test graphs (termed leakage) explains more of the observed variation in dependency parsing performance than other explanations. In this work we revisit this claim, testing it on more models and languages. We find that it only holds for zero-shot cross-lingual settings. We then propose a more fine-grained measure of such leakage which, unlike the original measure, not only explains but also correlates with observed performance variation. Code and data are available here: https://github.com/miriamwanner/reu-nlp-project
Anthology ID:
2022.findings-acl.230
Volume:
Findings of the Association for Computational Linguistics: ACL 2022
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2925–2934
Language:
URL:
https://aclanthology.org/2022.findings-acl.230
DOI:
10.18653/v1/2022.findings-acl.230
Bibkey:
Cite (ACL):
Nathaniel Krasner, Miriam Wanner, and Antonios Anastasopoulos. 2022. Revisiting the Effects of Leakage on Dependency Parsing. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2925–2934, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Revisiting the Effects of Leakage on Dependency Parsing (Krasner et al., Findings 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl-24-ws-corrections/2022.findings-acl.230.pdf
Video:
 https://preview.aclanthology.org/naacl-24-ws-corrections/2022.findings-acl.230.mp4
Code
 miriamwanner/reu-nlp-project