Kezia Oketch

2026

Learning from Scarcity: Building and Benchmarking Speech Technology for Sukuma.
Macton Mgonzo | Kezia Oketch | Naome A Etori | Winnie Mang'eni | Elizabeth Fabian Nyaki | Michael Samwel Mollel
Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)

Automatic Speech Recognition (ASR) systems are gaining increasing attention in both academia and industry. Despite having remarkable performance in high-resource languages, their efficacy is less pronounced in low-resource settings. We present the first ASR system for Sukuma, one of the most severely under-resourced Tanzanian languages, and provide an open-source Sukuma speech corpus comprising 7.47 hours of carefully transcribed audio. The data, sourced primarily from Bible readings, was rigorously annotated to ensure phonetic and orthographic consistency, making it the most linguistically reliable resource currently available for the Sukuma language. To establish baselines, we train lightweight ASR and Text-to-Speech (TTS) models that demonstrate the feasibility of building end-to-end speech systems for this underrepresented language. This work addresses the challenges of developing language and communication tools for speakers of less-represented languages, particularly the scarcity of representative datasets and benchmarks, and highlights future research directions for linguistically challenging languages, such as Sukuma. We make our data and code publicly available to facilitate reproducibility and further research.

2025

pdf bib abs

Bridging the LLM Accessibility Divide? Performance, Fairness, and Cost of Closed versus Open LLMs for Automated Essay Scoring
Kezia Oketch | John P. Lalor | Yi Yang | Ahmed Abbasi
Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)

Closed large language models (LLMs) such as GPT-4 have set state-of-the-art results across a number of NLP tasks and have become central to NLP and machine learning (ML)-driven solutions. Closed LLMs’ performance and wide adoption has sparked considerable debate about their accessibility in terms of availability, cost, and transparency. In this study, we perform a rigorous comparative analysis of eleven leading LLMs, spanning closed, open, and open-source LLM ecosystems, across text assessment and generation within automated essay scoring, as well as a separate evaluation on abstractive text summarization to examine generalization. Our findings reveal that for few-shot learning-based assessment of human generated essays, open LLMs such as Llama 3 and Qwen 2.5 perform comparably to GPT-4 in terms of predictive performance, with no significant differences in disparate impact scores when considering age- or race-related fairness. For summarization, we find that open models also match GPT-4 in ROUGE and METEOR scores on the CNN/DailyMail benchmark, both in zero- and few-shot settings. Moreover, Llama 3 offers a substantial cost advantage, being up to 37 times more cost-efficient than GPT-4. For generative tasks, we find that essays generated by top open LLMs are comparable to closed LLMs in terms of their semantic composition/embeddings and ML assessed scores. Our findings challenge the dominance of closed LLMs and highlight the democratizing potential of open LLMs, suggesting they can effectively bridge accessibility divides while maintaining competitive performance and fairness.

Co-authors

Michael Samwel Mollel 1

Elizabeth Fabian Nyaki 1

Yi Yang 1

Venues

Fix author