A Finite-State Morphological Analyser for Sindhi

Raveesh Motlani, Francis Tyers, Dipti Sharma


Abstract
Morphological analysis is a fundamental task in natural-language processing, which is used in other NLP applications such as part-of-speech tagging, syntactic parsing, information retrieval, machine translation, etc. In this paper, we present our work on the development of free/open-source finite-state morphological analyser for Sindhi. We have used Apertium’s lttoolbox as our finite-state toolkit to implement the transducer. The system is developed using a paradigm-based approach, wherein a paradigm defines all the word forms and their morphological features for a given stem (lemma). We have evaluated our system on the Sindhi Wikipedia corpus and achieved a reasonable coverage of 81% and a precision of over 97%.
Anthology ID:
L16-1409
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2572–2577
Language:
URL:
https://aclanthology.org/L16-1409
DOI:
Bibkey:
Cite (ACL):
Raveesh Motlani, Francis Tyers, and Dipti Sharma. 2016. A Finite-State Morphological Analyser for Sindhi. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 2572–2577, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
A Finite-State Morphological Analyser for Sindhi (Motlani et al., LREC 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/L16-1409.pdf