Abstract
The purpose of this article is to demonstrate that the recently developed automated rule-based syllabification system for Sesotho can be used broadly across the officially recognised South African Sotho-Tswana language group encompassing Sepedi, Sesotho and Setswana. We evaluate the automatic syllabification system on 400 words comprising 100 most frequently used words and 100 least-used words in Sepedi and Setswana as evident in the Autshumato corpus publicly available online. It is found that the Sesotho rule-based syllabification system can be used to correctly identify vowel-only syllables, consonant-vowel syllables and consonant-only syllables in Sepedi and Setswana. Among other findings, it has been demonstrated that words with diacritics as in the case of Sepedi are correctly broken down into syllables. We make two main recommendations. First, the rules for syllabification should be updated so that Sepedi diacritics are accommodated. Second, the syllabification system should be updated so that it reflects the broader Sotho-Tswana language group instead of being limited to Sesotho. Further research is needed to ascertain whether the complex consonant [ny] behaves similarly in all three officially recognised Sotho-Tswana languages and evaluate the need for a specific rule for the [ny] digraph.- Anthology ID:
- 2023.rail-1.9
- Volume:
- Proceedings of the Fourth workshop on Resources for African Indigenous Languages (RAIL 2023)
- Month:
- May
- Year:
- 2023
- Address:
- Dubrovnik, Croatia
- Editors:
- Rooweither Mabuya, Don Mthobela, Mmasibidi Setaka, Menno Van Zaanen
- Venue:
- RAIL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 76–85
- Language:
- URL:
- https://aclanthology.org/2023.rail-1.9
- DOI:
- 10.18653/v1/2023.rail-1.9
- Cite (ACL):
- Johannes Sibeko and Mmasibidi Setaka. 2023. Evaluating the Sesotho rule-based syllabification system on Sepedi and Setswana words. In Proceedings of the Fourth workshop on Resources for African Indigenous Languages (RAIL 2023), pages 76–85, Dubrovnik, Croatia. Association for Computational Linguistics.
- Cite (Informal):
- Evaluating the Sesotho rule-based syllabification system on Sepedi and Setswana words (Sibeko & Setaka, RAIL 2023)
- PDF:
- https://preview.aclanthology.org/emnlp22-frontmatter/2023.rail-1.9.pdf