SigMoreFun Submission to the SIGMORPHON Shared Task on Interlinear Glossing
Taiqi He, Lindia Tjuatja, Nathaniel Robinson, Shinji Watanabe, David R. Mortensen, Graham Neubig, Lori Levin
Abstract
In our submission to the SIGMORPHON 2023 Shared Task on interlinear glossing (IGT), we explore approaches to data augmentation and modeling across seven low-resource languages. For data augmentation, we explore two approaches: creating artificial data from the provided training data and utilizing existing IGT resources in other languages. On the modeling side, we test an enhanced version of the provided token classification baseline as well as a pretrained multilingual seq2seq model. Additionally, we apply post-correction using a dictionary for Gitksan, the language with the smallest amount of data. We find that our token classification models are the best performing, with the highest word-level accuracy for Arapaho and highest morpheme-level accuracy for Gitksan out of all submissions. We also show that data augmentation is an effective strategy, though applying artificial data pretraining has very different effects across both models tested.- Anthology ID:
- 2023.sigmorphon-1.22
- Volume:
- Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Garrett Nicolai, Eleanor Chodroff, Frederic Mailhot, Çağrı Çöltekin
- Venue:
- SIGMORPHON
- SIG:
- SIGMORPHON
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 209–216
- Language:
- URL:
- https://aclanthology.org/2023.sigmorphon-1.22
- DOI:
- 10.18653/v1/2023.sigmorphon-1.22
- Cite (ACL):
- Taiqi He, Lindia Tjuatja, Nathaniel Robinson, Shinji Watanabe, David R. Mortensen, Graham Neubig, and Lori Levin. 2023. SigMoreFun Submission to the SIGMORPHON Shared Task on Interlinear Glossing. In Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 209–216, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- SigMoreFun Submission to the SIGMORPHON Shared Task on Interlinear Glossing (He et al., SIGMORPHON 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/2023.sigmorphon-1.22.pdf