Chiamaka Chukwuneke


2023

pdf
Findings of the 1st Shared Task on Multi-lingual Multi-task Information Retrieval at MRL 2023
Francesco Tinner | David Ifeoluwa Adelani | Chris Emezue | Mammad Hajili | Omer Goldman | Muhammad Farid Adilazuarda | Muhammad Dehan Al Kautsar | Aziza Mirsaidova | Müge Kural | Dylan Massey | Chiamaka Chukwuneke | Chinedu Mbonu | Damilola Oluwaseun Oloyede | Kayode Olaleye | Jonathan Atala | Benjamin A. Ajibade | Saksham Bassi | Rahul Aralikatte | Najoung Kim | Duygu Ataman
Proceedings of the 3rd Workshop on Multi-lingual Representation Learning (MRL)

pdf
Cross-lingual Open-Retrieval Question Answering for African Languages
Odunayo Ogundepo | Tajuddeen Gwadabe | Clara Rivera | Jonathan Clark | Sebastian Ruder | David Adelani | Bonaventure Dossou | Abdou Diop | Claytone Sikasote | Gilles Hacheme | Happy Buzaaba | Ignatius Ezeani | Rooweither Mabuya | Salomey Osei | Chris Emezue | Albert Kahira | Shamsuddeen Muhammad | Akintunde Oladipo | Abraham Owodunni | Atnafu Tonja | Iyanuoluwa Shode | Akari Asai | Anuoluwapo Aremu | Ayodele Awokoya | Bernard Opoku | Chiamaka Chukwuneke | Christine Mwase | Clemencia Siro | Stephen Arthur | Tunde Ajayi | Verrah Otiende | Andre Rubungo | Boyd Sinkala | Daniel Ajisafe | Emeka Onwuegbuzia | Falalu Lawan | Ibrahim Ahmad | Jesujoba Alabi | Chinedu Mbonu | Mofetoluwa Adeyemi | Mofya Phiri | Orevaoghene Ahia | Ruqayya Iro | Sonia Adhiambo
Findings of the Association for Computational Linguistics: EMNLP 2023

African languages have far less in-language content available digitally, making it challenging for question answering systems to satisfy the information needs of users. Cross-lingual open-retrieval question answering (XOR QA) systems – those that retrieve answer content from other languages while serving people in their native language—offer a means of filling this gap. To this end, we create Our Dataset, the first cross-lingual QA dataset with a focus on African languages. Our Dataset includes 12,000+ XOR QA examples across 10 African languages. While previous datasets have focused primarily on languages where cross-lingual QA augments coverage from the target language, Our Dataset focuses on languages where cross-lingual answer content is the only high-coverage source of answer content. Because of this, we argue that African languages are one of the most important and realistic use cases for XOR QA. Our experiments demonstrate the poor performance of automatic translation and multilingual retrieval methods. Overall, Our Dataset proves challenging for state-of-the-art QA models. We hope that the dataset enables the development of more equitable QA technology.

2022

pdf
MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition
David Adelani | Graham Neubig | Sebastian Ruder | Shruti Rijhwani | Michael Beukman | Chester Palen-Michel | Constantine Lignos | Jesujoba Alabi | Shamsuddeen Muhammad | Peter Nabende | Cheikh M. Bamba Dione | Andiswa Bukula | Rooweither Mabuya | Bonaventure F. P. Dossou | Blessing Sibanda | Happy Buzaaba | Jonathan Mukiibi | Godson Kalipe | Derguene Mbaye | Amelia Taylor | Fatoumata Kabore | Chris Chinenye Emezue | Anuoluwapo Aremu | Perez Ogayo | Catherine Gitau | Edwin Munkoh-Buabeng | Victoire Memdjokam Koagne | Allahsera Auguste Tapo | Tebogo Macucwa | Vukosi Marivate | Mboning Tchiaze Elvis | Tajuddeen Gwadabe | Tosin Adewumi | Orevaoghene Ahia | Joyce Nakatumba-Nabende | Neo Lerato Mokono | Ignatius Ezeani | Chiamaka Chukwuneke | Mofetoluwa Oluwaseun Adeyemi | Gilles Quentin Hacheme | Idris Abdulmumin | Odunayo Ogundepo | Oreen Yousuf | Tatiana Moteu | Dietrich Klakow
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

African languages are spoken by over a billion people, but they are under-represented in NLP research and development. Multiple challenges exist, including the limited availability of annotated training and evaluation datasets as well as the lack of understanding of which settings, languages, and recently proposed methods like cross-lingual transfer will be effective. In this paper, we aim to move towards solutions for these challenges, focusing on the task of named entity recognition (NER). We present the creation of the largest to-date human-annotated NER dataset for 20 African languages. We study the behaviour of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, empirically demonstrating that the choice of source transfer language significantly affects performance. While much previous work defaults to using English as the source language, our results show that choosing the best transfer language improves zero-shot F1 scores by an average of 14% over 20 languages as compared to using English.

pdf
IgboBERT Models: Building and Training Transformer Models for the Igbo Language
Chiamaka Chukwuneke | Ignatius Ezeani | Paul Rayson | Mahmoud El-Haj
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This work presents a standard Igbo named entity recognition (IgboNER) dataset as well as the results from training and fine-tuning state-of-the-art transformer IgboNER models. We discuss the process of our dataset creation - data collection and annotation and quality checking. We also present experimental processes involved in building an IgboBERT language model from scratch as well as fine-tuning it along with other non-Igbo pre-trained models for the downstream IgboNER task. Our results show that, although the IgboNER task benefited hugely from fine-tuning large transformer model, fine-tuning a transformer model built from scratch with comparatively little Igbo text data seems to yield quite decent results for the IgboNER task. This work will contribute immensely to IgboNLP in particular as well as the wider African and low-resource NLP efforts Keywords: Igbo, named entity recognition, BERT models, under-resourced, dataset

2021

pdf
MasakhaNER: Named Entity Recognition for African Languages
David Ifeoluwa Adelani | Jade Abbott | Graham Neubig | Daniel D’souza | Julia Kreutzer | Constantine Lignos | Chester Palen-Michel | Happy Buzaaba | Shruti Rijhwani | Sebastian Ruder | Stephen Mayhew | Israel Abebe Azime | Shamsuddeen H. Muhammad | Chris Chinenye Emezue | Joyce Nakatumba-Nabende | Perez Ogayo | Aremu Anuoluwapo | Catherine Gitau | Derguene Mbaye | Jesujoba Alabi | Seid Muhie Yimam | Tajuddeen Rabiu Gwadabe | Ignatius Ezeani | Rubungo Andre Niyongabo | Jonathan Mukiibi | Verrah Otiende | Iroro Orife | Davis David | Samba Ngom | Tosin Adewumi | Paul Rayson | Mofetoluwa Adeyemi | Gerald Muriuki | Emmanuel Anebi | Chiamaka Chukwuneke | Nkiruka Odu | Eric Peter Wairagala | Samuel Oyerinde | Clemencia Siro | Tobius Saul Bateesa | Temilola Oloyede | Yvonne Wambui | Victor Akinode | Deborah Nabagereka | Maurice Katusiime | Ayodele Awokoya | Mouhamadane MBOUP | Dibora Gebreyohannes | Henok Tilaye | Kelechi Nwaike | Degaga Wolde | Abdoulaye Faye | Blessing Sibanda | Orevaoghene Ahia | Bonaventure F. P. Dossou | Kelechi Ogueji | Thierno Ibrahima DIOP | Abdoulaye Diallo | Adewale Akinfaderin | Tendai Marengereke | Salomey Osei
Transactions of the Association for Computational Linguistics, Volume 9

We take a step towards addressing the under- representation of the African continent in NLP research by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten African languages. We detail the characteristics of these languages to help researchers and practitioners better understand the challenges they pose for NER tasks. We analyze our datasets and conduct an extensive empirical evaluation of state- of-the-art methods across both supervised and transfer learning settings. Finally, we release the data, code, and models to inspire future research on African NLP.1
Search
Co-authors
Venues