Automatic Interlinear Glossing for Under-Resourced Languages Leveraging Translations
Xingyuan Zhao, Satoru Ozaki, Antonios Anastasopoulos, Graham Neubig, Lori Levin
Abstract
Interlinear Glossed Text (IGT) is a widely used format for encoding linguistic information in language documentation projects and scholarly papers. Manual production of IGT takes time and requires linguistic expertise. We attempt to address this issue by creating automatic glossing models, using modern multi-source neural models that additionally leverage easy-to-collect translations. We further explore cross-lingual transfer and a simple output length control mechanism, further refining our models. Evaluated on three challenging low-resource scenarios, our approach significantly outperforms a recent, state-of-the-art baseline, particularly improving on overall accuracy as well as lemma and tag recall.- Anthology ID:
- 2020.coling-main.471
- Volume:
- Proceedings of the 28th International Conference on Computational Linguistics
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona, Spain (Online)
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 5397–5408
- Language:
- URL:
- https://aclanthology.org/2020.coling-main.471
- DOI:
- 10.18653/v1/2020.coling-main.471
- Cite (ACL):
- Xingyuan Zhao, Satoru Ozaki, Antonios Anastasopoulos, Graham Neubig, and Lori Levin. 2020. Automatic Interlinear Glossing for Under-Resourced Languages Leveraging Translations. In Proceedings of the 28th International Conference on Computational Linguistics, pages 5397–5408, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Cite (Informal):
- Automatic Interlinear Glossing for Under-Resourced Languages Leveraging Translations (Zhao et al., COLING 2020)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/2020.coling-main.471.pdf