How to Create Treebanks without Human Annotators – An Indigenous Language Grammar Checker for Treebank Construction

Linda Wiechetek, Flammie A Pirinen, Maja Lisa Kappfjell


Abstract
Creating treebanks for low resource languages is an important task. However, low resource Indigenous language contexts have not only limited resources in terms of text data, but also limited human resources that are available for linguistic annotation. We suggest a work-around by applying a Constraint Grammar operated rule-based dependency parser to do the work of creating a marked-up treebank. However, due to a lot of noise, meaning spelling and grammatical errors in South Sámi written texts, this tool often fails to create complete and correct trees. As a fix to this, we created a grammar checking tool for the most common South Sámi grammatical error types, which improves the quality of the dependency parser significantly. As both literacy and normative standards for most Indigenous languages are much more recent than for majority languages, spelling and grammatical variation and errors are a common source of noise, and the application of a correction tool like ours can be useful in the construction of treebanks for these languages.
Anthology ID:
2025.tlt-1.14
Volume:
Proceedings of the 23rd International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2025)
Month:
August
Year:
2025
Address:
Ljubljana, Slovenia
Editors:
Sarah Jablotschkin, Sandra Kübler, Heike Zinsmeister
Venues:
TLT | WS | SyntaxFest
SIG:
SIGPARSE
Publisher:
Association for Computational Linguistics
Note:
Pages:
119–128
Language:
URL:
https://preview.aclanthology.org/corrections-2025-08/2025.tlt-1.14/
DOI:
Bibkey:
Cite (ACL):
Linda Wiechetek, Flammie A Pirinen, and Maja Lisa Kappfjell. 2025. How to Create Treebanks without Human Annotators – An Indigenous Language Grammar Checker for Treebank Construction. In Proceedings of the 23rd International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2025), pages 119–128, Ljubljana, Slovenia. Association for Computational Linguistics.
Cite (Informal):
How to Create Treebanks without Human Annotators – An Indigenous Language Grammar Checker for Treebank Construction (Wiechetek et al., TLT-SyntaxFest 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2025-08/2025.tlt-1.14.pdf