Chiamaka Chukwuneke


2022

pdf
IgboBERT Models: Building and Training Transformer Models for the Igbo Language
Chiamaka Chukwuneke | Ignatius Ezeani | Paul Rayson | Mahmoud El-Haj
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This work presents a standard Igbo named entity recognition (IgboNER) dataset as well as the results from training and fine-tuning state-of-the-art transformer IgboNER models. We discuss the process of our dataset creation - data collection and annotation and quality checking. We also present experimental processes involved in building an IgboBERT language model from scratch as well as fine-tuning it along with other non-Igbo pre-trained models for the downstream IgboNER task. Our results show that, although the IgboNER task benefited hugely from fine-tuning large transformer model, fine-tuning a transformer model built from scratch with comparatively little Igbo text data seems to yield quite decent results for the IgboNER task. This work will contribute immensely to IgboNLP in particular as well as the wider African and low-resource NLP efforts Keywords: Igbo, named entity recognition, BERT models, under-resourced, dataset

pdf
MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition
David Adelani | Graham Neubig | Sebastian Ruder | Shruti Rijhwani | Michael Beukman | Chester Palen-Michel | Constantine Lignos | Jesujoba Alabi | Shamsuddeen Muhammad | Peter Nabende | Cheikh M. Bamba Dione | Andiswa Bukula | Rooweither Mabuya | Bonaventure F. P. Dossou | Blessing Sibanda | Happy Buzaaba | Jonathan Mukiibi | Godson Kalipe | Derguene Mbaye | Amelia Taylor | Fatoumata Kabore | Chris Chinenye Emezue | Anuoluwapo Aremu | Perez Ogayo | Catherine Gitau | Edwin Munkoh-Buabeng | Victoire Memdjokam Koagne | Allahsera Auguste Tapo | Tebogo Macucwa | Vukosi Marivate | Mboning Tchiaze Elvis | Tajuddeen Gwadabe | Tosin Adewumi | Orevaoghene Ahia | Joyce Nakatumba-Nabende | Neo Lerato Mokono | Ignatius Ezeani | Chiamaka Chukwuneke | Mofetoluwa Oluwaseun Adeyemi | Gilles Quentin Hacheme | Idris Abdulmumin | Odunayo Ogundepo | Oreen Yousuf | Tatiana Moteu | Dietrich Klakow
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

African languages are spoken by over a billion people, but they are under-represented in NLP research and development. Multiple challenges exist, including the limited availability of annotated training and evaluation datasets as well as the lack of understanding of which settings, languages, and recently proposed methods like cross-lingual transfer will be effective. In this paper, we aim to move towards solutions for these challenges, focusing on the task of named entity recognition (NER). We present the creation of the largest to-date human-annotated NER dataset for 20 African languages. We study the behaviour of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, empirically demonstrating that the choice of source transfer language significantly affects performance. While much previous work defaults to using English as the source language, our results show that choosing the best transfer language improves zero-shot F1 scores by an average of 14% over 20 languages as compared to using English.

2021

pdf
MasakhaNER: Named Entity Recognition for African Languages
David Ifeoluwa Adelani | Jade Abbott | Graham Neubig | Daniel D’souza | Julia Kreutzer | Constantine Lignos | Chester Palen-Michel | Happy Buzaaba | Shruti Rijhwani | Sebastian Ruder | Stephen Mayhew | Israel Abebe Azime | Shamsuddeen H. Muhammad | Chris Chinenye Emezue | Joyce Nakatumba-Nabende | Perez Ogayo | Aremu Anuoluwapo | Catherine Gitau | Derguene Mbaye | Jesujoba Alabi | Seid Muhie Yimam | Tajuddeen Rabiu Gwadabe | Ignatius Ezeani | Rubungo Andre Niyongabo | Jonathan Mukiibi | Verrah Otiende | Iroro Orife | Davis David | Samba Ngom | Tosin Adewumi | Paul Rayson | Mofetoluwa Adeyemi | Gerald Muriuki | Emmanuel Anebi | Chiamaka Chukwuneke | Nkiruka Odu | Eric Peter Wairagala | Samuel Oyerinde | Clemencia Siro | Tobius Saul Bateesa | Temilola Oloyede | Yvonne Wambui | Victor Akinode | Deborah Nabagereka | Maurice Katusiime | Ayodele Awokoya | Mouhamadane MBOUP | Dibora Gebreyohannes | Henok Tilaye | Kelechi Nwaike | Degaga Wolde | Abdoulaye Faye | Blessing Sibanda | Orevaoghene Ahia | Bonaventure F. P. Dossou | Kelechi Ogueji | Thierno Ibrahima DIOP | Abdoulaye Diallo | Adewale Akinfaderin | Tendai Marengereke | Salomey Osei
Transactions of the Association for Computational Linguistics, Volume 9

Abstract We take a step towards addressing the under- representation of the African continent in NLP research by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten African languages. We detail the characteristics of these languages to help researchers and practitioners better understand the challenges they pose for NER tasks. We analyze our datasets and conduct an extensive empirical evaluation of state- of-the-art methods across both supervised and transfer learning settings. Finally, we release the data, code, and models to inspire future research on African NLP.1
Search
Co-authors
Venues