A GitHub-based Workflow for Annotated Resource Development

Brandon Waldon, Nathan Schneider


Abstract
Computational linguists have long recognized the value of version control systems such as Git (and related platforms, e.g., GitHub) when it comes to managing and distributing computer code. However, the benefits of version control remain under-explored for a central activity within computational linguistics: the development of annotated natural language resources. We argue that researchers can employ version control practices to make development workflows more transparent, efficient, consistent, and participatory. We report a proof-of-concept, GitHub-based solution which facilitated the creation of a legal English treebank.
Anthology ID:
2025.law-1.27
Volume:
Proceedings of the 19th Linguistic Annotation Workshop (LAW-XIX-2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Siyao Peng, Ines Rehbein
Venues:
LAW | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
326–331
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.law-1.27/
DOI:
10.18653/v1/2025.law-1.27
Bibkey:
Cite (ACL):
Brandon Waldon and Nathan Schneider. 2025. A GitHub-based Workflow for Annotated Resource Development. In Proceedings of the 19th Linguistic Annotation Workshop (LAW-XIX-2025), pages 326–331, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
A GitHub-based Workflow for Annotated Resource Development (Waldon & Schneider, LAW 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.law-1.27.pdf