Tagging Ingush - Language Technology For Low-Resource Languages Using Resources From Linguistic Field Work

Jörg Tiedemann, Johanna Nichols, Ronald Sprouse


Abstract
This paper presents on-going work on creating NLP tools for under-resourced languages from very sparse training data coming from linguistic field work. In this work, we focus on Ingush, a Nakh-Daghestanian language spoken by about 300,000 people in the Russian republics Ingushetia and Chechnya. We present work on morphosyntactic taggers trained on transcribed and linguistically analyzed recordings and dependency parsers using English glosses to project annotation for creating synthetic treebanks. Our preliminary results are promising, supporting the goal of bootstrapping efficient NLP tools with limited or no task-specific annotated data resources available.
Anthology ID:
W16-4020
Volume:
Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Erhard Hinrichs, Marie Hinrichs, Thorsten Trippel
Venue:
LT4DH
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
148–155
Language:
URL:
https://aclanthology.org/W16-4020
DOI:
Bibkey:
Cite (ACL):
Jörg Tiedemann, Johanna Nichols, and Ronald Sprouse. 2016. Tagging Ingush - Language Technology For Low-Resource Languages Using Resources From Linguistic Field Work. In Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH), pages 148–155, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Tagging Ingush - Language Technology For Low-Resource Languages Using Resources From Linguistic Field Work (Tiedemann et al., LT4DH 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-1/W16-4020.pdf