Expanding the Universal Dependencies Ancient Hebrew Treebank with Constituency Data

Daniel G. Swanson


Abstract
This paper presents an effort to expand the annotation pipeline for the Ancient Hebrew Universal Dependencies treebank to make use of additional data, resulting in the addition of over 4000 sentences and roughly 100K words to the released treebank. The resulting treebank contains 5500 sentences and 145K words and the incorporation of converted constituency data has resulted in an annotation process which requires manual intervention in only around 15-20% of sentences, even in previously unseen genres.
Anthology ID:
2025.tlt-1.3
Volume:
Proceedings of the 23rd International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2025)
Month:
August
Year:
2025
Address:
Ljubljana, Slovenia
Editors:
Sarah Jablotschkin, Sandra Kübler, Heike Zinsmeister
Venues:
TLT | WS | SyntaxFest
SIG:
SIGPARSE
Publisher:
Association for Computational Linguistics
Note:
Pages:
23–31
Language:
URL:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.tlt-1.3/
DOI:
Bibkey:
Cite (ACL):
Daniel G. Swanson. 2025. Expanding the Universal Dependencies Ancient Hebrew Treebank with Constituency Data. In Proceedings of the 23rd International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2025), pages 23–31, Ljubljana, Slovenia. Association for Computational Linguistics.
Cite (Informal):
Expanding the Universal Dependencies Ancient Hebrew Treebank with Constituency Data (Swanson, TLT-SyntaxFest 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.tlt-1.3.pdf