Abstract
We present, to our knowledge, the first ever published morphological analyser and generator for Sakha, a marginalised language of Siberia. The transducer, developed using HFST, has coverage of solidly above 90%, and high precision. In the development of the analyser, we have expanded linguistic knowledge about Sakha, and developed strategies for complex grammatical patterns. The transducer is already being used in downstream tasks, including computer assisted language learning applications for linguistic maintenance and computational linguistic shared tasks.- Anthology ID:
- 2022.lrec-1.550
- Volume:
- Proceedings of the Thirteenth Language Resources and Evaluation Conference
- Month:
- June
- Year:
- 2022
- Address:
- Marseille, France
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 5137–5142
- Language:
- URL:
- https://aclanthology.org/2022.lrec-1.550
- DOI:
- Cite (ACL):
- Sardana Ivanova, Jonathan Washington, and Francis Tyers. 2022. A Free/Open-Source Morphological Analyser and Generator for Sakha. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 5137–5142, Marseille, France. European Language Resources Association.
- Cite (Informal):
- A Free/Open-Source Morphological Analyser and Generator for Sakha (Ivanova et al., LREC 2022)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2022.lrec-1.550.pdf
- Code
- apertium/apertium-sah