Abstract
Interlinear Morphological Glosses are annotations produced in the context of language documentation. Their goal is to identify morphs occurring in an L1 sentence and to explicit their function and meaning, with the further support of an associated translation in L2. We study here the task of automatic glossing, aiming to provide linguists with adequate tools to facilitate this process. Our formalisation of glossing uses a latent variable Conditional Random Field (CRF), which labels the L1 morphs while simultaneously aligning them to L2 words. In experiments with several under-resourced languages, we show that this approach is both effective and data-efficient and mitigates the problem of annotating unknown morphs. We also discuss various design choices regarding the alignment process and the selection of features. We finally demonstrate that it can benefit from multilingual (pre-)training, achieving results which outperform very strong baselines.- Anthology ID:
- 2023.findings-emnlp.396
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2023
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 5958–5971
- Language:
- URL:
- https://aclanthology.org/2023.findings-emnlp.396
- DOI:
- 10.18653/v1/2023.findings-emnlp.396
- Cite (ACL):
- Shu Okabe and François Yvon. 2023. Towards Multilingual Interlinear Morphological Glossing. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 5958–5971, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Towards Multilingual Interlinear Morphological Glossing (Okabe & Yvon, Findings 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/2023.findings-emnlp.396.pdf