Aron Zahran


2023

pdf
Preparing a corpus of spoken Xhosa
Eva-Marie Bloom Ström | Onelisa Slater | Aron Zahran | Aleksandrs Berdicevskis | Anne Schumacher
Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD)

The aim of this paper is to describe ongoing work on an annotated corpus of spoken Xhosa. The data consists of natural spoken language and includes regional and social variation. We discuss encountered challenges with preparing such data from a lower-resourced language for corpus use. We describe the annotation, the search interface and the pilot experiments on automatic glossing of this highly agglutinative language.