Catherine Gitau - ACL Anthology

This is an internal, incomplete preview of a proposed change to the ACL Anthology. For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes. Do not treat this content as an official publication.

Catherine Gitau

2023

MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African languages
Cheikh M. Bamba Dione | David Ifeoluwa Adelani | Peter Nabende | Jesujoba Alabi | Thapelo Sindane | Happy Buzaaba | Shamsuddeen Hassan Muhammad | Chris Chinenye Emezue | Perez Ogayo | Anuoluwapo Aremu | Catherine Gitau | Derguene Mbaye | Jonathan Mukiibi | Blessing Sibanda | Bonaventure F. P. Dossou | Andiswa Bukula | Rooweither Mabuya | Allahsera Auguste Tapo | Edwin Munkoh-Buabeng | Victoire Memdjokam Koagne | Fatoumata Ouoba Kabore | Amelia Taylor | Godson Kalipe | Tebogo Macucwa | Vukosi Marivate | Tajuddeen Gwadabe | Mboning Tchiaze Elvis | Ikechukwu Onyenwe | Gratien Atindogbe | Tolulope Adelani | Idris Akinade | Olanrewaju Samuel | Marien Nahimana | Théogène Musabeyezu | Emile Niyomutabazi | Ester Chimhenga | Kudzai Gotosa | Patrick Mizha | Apelete Agbolo | Seydou Traore | Chinedu Uchechukwu | Aliyu Yusuf | Muhammad Abdullahi | Dietrich Klakow
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper, we present AfricaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss the challenges in annotating POS for these languages using the universal dependencies (UD) guidelines. We conducted extensive POS baseline experiments using both conditional random field and several multilingual pre-trained language models. We applied various cross-lingual transfer models trained with data available in the UD. Evaluating on the AfricaPOS dataset, we show that choosing the best transfer language(s) in both single-source and multi-source setups greatly improves the POS tagging performance of the target languages, in particular when combined with parameter-fine-tuning methods. Crucially, transferring knowledge from a language that matches the language family and morphosyntactic properties seems to be more effective for POS tagging in unseen languages.

2022

MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition
David Ifeoluwa Adelani | Graham Neubig | Sebastian Ruder | Shruti Rijhwani | Michael Beukman | Chester Palen-Michel | Constantine Lignos | Jesujoba O. Alabi | Shamsuddeen H. Muhammad | Peter Nabende | Cheikh M. Bamba Dione | Andiswa Bukula | Rooweither Mabuya | Bonaventure F. P. Dossou | Blessing Sibanda | Happy Buzaaba | Jonathan Mukiibi | Godson Kalipe | Derguene Mbaye | Amelia Taylor | Fatoumata Kabore | Chris Chinenye Emezue | Anuoluwapo Aremu | Perez Ogayo | Catherine Gitau | Edwin Munkoh-Buabeng | Victoire Memdjokam Koagne | Allahsera Auguste Tapo | Tebogo Macucwa | Vukosi Marivate | Elvis Mboning | Tajuddeen Gwadabe | Tosin Adewumi | Orevaoghene Ahia | Joyce Nakatumba-Nabende | Neo L. Mokono | Ignatius Ezeani | Chiamaka Chukwuneke | Mofetoluwa Adeyemi | Gilles Q. Hacheme | Idris Abdulmumin | Odunayo Ogundepo | Oreen Yousuf | Tatiana Moteu Ngoli | Dietrich Klakow
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

African languages are spoken by over a billion people, but they are under-represented in NLP research and development. Multiple challenges exist, including the limited availability of annotated training and evaluation datasets as well as the lack of understanding of which settings, languages, and recently proposed methods like cross-lingual transfer will be effective. In this paper, we aim to move towards solutions for these challenges, focusing on the task of named entity recognition (NER). We present the creation of the largest to-date human-annotated NER dataset for 20 African languages. We study the behaviour of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, empirically demonstrating that the choice of source transfer language significantly affects performance. While much previous work defaults to using English as the source language, our results show that choosing the best transfer language improves zero-shot F1 scores by an average of 14% over 20 languages as compared to using English.

2021

MasakhaNER: Named Entity Recognition for African Languages
David Ifeoluwa Adelani | Jade Abbott | Graham Neubig | Daniel D’souza | Julia Kreutzer | Constantine Lignos | Chester Palen-Michel | Happy Buzaaba | Shruti Rijhwani | Sebastian Ruder | Stephen Mayhew | Israel Abebe Azime | Shamsuddeen H. Muhammad | Chris Chinenye Emezue | Joyce Nakatumba-Nabende | Perez Ogayo | Aremu Anuoluwapo | Catherine Gitau | Derguene Mbaye | Jesujoba Alabi | Seid Muhie Yimam | Tajuddeen Rabiu Gwadabe | Ignatius Ezeani | Rubungo Andre Niyongabo | Jonathan Mukiibi | Verrah Otiende | Iroro Orife | Davis David | Samba Ngom | Tosin Adewumi | Paul Rayson | Mofetoluwa Adeyemi | Gerald Muriuki | Emmanuel Anebi | Chiamaka Chukwuneke | Nkiruka Odu | Eric Peter Wairagala | Samuel Oyerinde | Clemencia Siro | Tobius Saul Bateesa | Temilola Oloyede | Yvonne Wambui | Victor Akinode | Deborah Nabagereka | Maurice Katusiime | Ayodele Awokoya | Mouhamadane MBOUP | Dibora Gebreyohannes | Henok Tilaye | Kelechi Nwaike | Degaga Wolde | Abdoulaye Faye | Blessing Sibanda | Orevaoghene Ahia | Bonaventure F. P. Dossou | Kelechi Ogueji | Thierno Ibrahima DIOP | Abdoulaye Diallo | Adewale Akinfaderin | Tendai Marengereke | Salomey Osei
Transactions of the Association for Computational Linguistics, Volume 9

We take a step towards addressing the under- representation of the African continent in NLP research by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten African languages. We detail the characteristics of these languages to help researchers and practitioners better understand the challenges they pose for NER tasks. We analyze our datasets and conduct an extensive empirical evaluation of state- of-the-art methods across both supervised and transfer learning settings. Finally, we release the data, code, and models to inspire future research on African NLP.1

Co-authors

Derguene Mbaye 3

Shamsuddeen Hassan Muhammad 3

Jonathan Mukiibi 3

Blessing Kudzaishe Sibanda 3

Tosin Adewumi 2

Mofetoluwa Adeyemi 2

Orevaoghene Ahia 2

Anuoluwapo Aremu 2

Andiswa Bukula 2

Chiamaka Chukwuneke 2

Cheikh M. Bamba Dione 2

Ignatius Ezeani 2

Tajuddeen Gwadabe 2

Godson Kalipe 2

Dietrich Klakow 2

Constantine Lignos 2

Rooweither Mabuya 2

Tebogo Macucwa 2

Vukosi Marivate 2

Victoire Memdjokam Koagne 2

Edwin Munkoh-Buabeng 2

Peter Nabende 2

Joyce Nakatumba-Nabende 2

Graham Neubig 2

Chester Palen-Michel 2

Shruti Rijhwani 2

Sebastian Ruder 2

Allahsera Auguste Tapo 2

Amelia Taylor 2

Muhammad Abdullahi 1

Idris Abdulmumin 1

Tolulope Adelani 1

Apelete Agbolo 1

Idris Akinade 1

Adewale Akinfaderin 1

Victor Akinode 1

Emmanuel Anebi 1

Aremu Anuoluwapo 1

Gratien Atindogbe 1

Ayodele Awokoya 1

Israel Abebe Azime 1

Tobius Saul Bateesa 1

Michael Beukman 1

Ester Chimhenga 1

Thierno Ibrahima DIOP 1

Abdoulaye Diallo 1

Daniel D’souza 1

Mboning Tchiaze Elvis 1

Abdoulaye Faye 1

Dibora Gebreyohannes 1

Kudzai Gotosa 1

Tajuddeen Rabiu Gwadabe 1

Gilles Q. Hacheme 1

Fatoumata Kabore 1

Maurice Katusiime 1

Julia Kreutzer 1

Mouhamadane MBOUP 1

Tendai Marengereke 1

Stephen Mayhew 1

Elvis Mboning 1

Patrick Mizha 1

Neo L. Mokono 1

Tatiana Moteu Ngoli 1

Gerald Muriuki 1

Théogène Musabeyezu 1

Deborah Nabagereka 1

Marien Nahimana 1

Emile Niyomutabazi 1

Rubungo Andre Niyongabo 1

Kelechi Nwaike 1

Kelechi Ogueji 1

Odunayo Ogundepo 1

Temilola Oloyede 1

Ikechukwu Onyenwe 1

Verrah Otiende 1

Fatoumata Ouoba Kabore 1

Samuel Oyerinde 1

Olanrewaju Samuel 1

Thapelo Sindane 1

Clemencia Siro 1

Seydou Traore 1

Chinedu Uchechukwu 1

Eric Peter Wairagala 1

Yvonne Wambui 1

Seid Muhie Yimam 1

Venues