Abstract
This study is an attempt to contribute to documentation and revitalization efforts of endangered Laz language, a member of South Caucasian language family mainly spoken on northeastern coastline of Turkey. It constitutes the first steps to create a general computational model for word form recognition and production for Laz by building a rule-based morphological analyser using Helsinki Finite-State Toolkit (HFST). The evaluation results show that the analyser has a 64.9% coverage over a corpus collected for this study with 111,365 tokens. We have also performed an error analysis on randomly selected 100 tokens from the corpus which are not covered by the analyser, and these results show that the errors mostly result from Turkish words in the corpus and missing stems in our lexicon.- Anthology ID:
- R19-1101
- Volume:
- Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
- Month:
- September
- Year:
- 2019
- Address:
- Varna, Bulgaria
- Venue:
- RANLP
- SIG:
- Publisher:
- INCOMA Ltd.
- Note:
- Pages:
- 869–877
- Language:
- URL:
- https://aclanthology.org/R19-1101
- DOI:
- 10.26615/978-954-452-056-4_101
- Cite (ACL):
- Esra Onal and Francis Tyers. 2019. Building a Morphological Analyser for Laz. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 869–877, Varna, Bulgaria. INCOMA Ltd..
- Cite (Informal):
- Building a Morphological Analyser for Laz (Onal & Tyers, RANLP 2019)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/R19-1101.pdf