An Inflectional Database for Gitksan
Bruce Oliver, Clarissa Forbes, Changbing Yang, Farhan Samir, Edith Coates, Garrett Nicolai, Miikka Silfverberg
Abstract
This paper presents a new inflectional resource for Gitksan, a low-resource Indigenous language of Canada. We use Gitksan data in interlinear glossed format, stemming from language documentation efforts, to build a database of partial inflection tables. We then enrich this morphological resource by filling in blank slots in the partial inflection tables using neural transformer reinflection models. We extend the training data for our transformer reinflection models using two data augmentation techniques: data hallucination and back-translation. Experimental results demonstrate substantial improvements from data augmentation, with data hallucination delivering particularly impressive gains. We also release reinflection models for Gitksan.- Anthology ID:
- 2022.lrec-1.710
- Volume:
- Proceedings of the Thirteenth Language Resources and Evaluation Conference
- Month:
- June
- Year:
- 2022
- Address:
- Marseille, France
- Editors:
- Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 6597–6606
- Language:
- URL:
- https://aclanthology.org/2022.lrec-1.710
- DOI:
- Cite (ACL):
- Bruce Oliver, Clarissa Forbes, Changbing Yang, Farhan Samir, Edith Coates, Garrett Nicolai, and Miikka Silfverberg. 2022. An Inflectional Database for Gitksan. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 6597–6606, Marseille, France. European Language Resources Association.
- Cite (Informal):
- An Inflectional Database for Gitksan (Oliver et al., LREC 2022)
- PDF:
- https://preview.aclanthology.org/emnlp-22-attachments/2022.lrec-1.710.pdf
- Code
- mpsilfve/gitksan-data
- Data
- Universal Dependencies