Abstract
Automatic morphology induction is important for computational processing of natural language. In resource-scarce languages in particular, it offers the possibility of supplementing data-driven strategies of Natural Language Processing with morphological rules that may cater for out-of-vocabulary words. Unfortunately, popular approaches to unsupervised morphology induction do not work for some of the most productive morphological processes of the Yorùbá language. To the best of our knowledge, the automatic induction of such morphological processes as full and partial reduplication, infixation, interfixation, compounding and other morphological processes, particularly those based on the affixation of stem-derived morphemes have not been adequately addressed in the literature. This study proposes a method for the automatic detection of stem-derived morphemes in Yorùbá. Words in a Yorùbá lexicon of 14,670 word-tokens were clustered around “word-labels”. A word-label is a textual proxy of the patterns imposed on words by the morphological processes through which they were formed. Results confirm a conjectured significant difference between the predicted and observed probabilities of word-labels motivated by stem-derived morphemes. This difference was used as basis for automatic identification of words formed by the affixation of stem-derived morphemes. Keywords: Unsupervised Morphology Induction, Recurrent Partials, Recurrent Patterns, Stem-derived Morphemes, Word-labels.- Anthology ID:
- 2022.sigul-1.19
- Volume:
- Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages
- Month:
- June
- Year:
- 2022
- Address:
- Marseille, France
- Editors:
- Maite Melero, Sakriani Sakti, Claudia Soria
- Venue:
- SIGUL
- SIG:
- SIGUL
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 146–154
- Language:
- URL:
- https://aclanthology.org/2022.sigul-1.19
- DOI:
- Cite (ACL):
- Tunde Adegbola. 2022. Automatic Detection of Morphological Processes in the Yorùbá Language. In Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages, pages 146–154, Marseille, France. European Language Resources Association.
- Cite (Informal):
- Automatic Detection of Morphological Processes in the Yorùbá Language (Adegbola, SIGUL 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2022.sigul-1.19.pdf