Derguene Mbaye
2025
Task-Oriented Dialog Systems for the Senegalese Wolof Language
Derguene Mbaye | Moussa Diallo
Proceedings of the 31st International Conference on Computational Linguistics
Derguene Mbaye | Moussa Diallo
Proceedings of the 31st International Conference on Computational Linguistics
In recent years, we are seeing considerable interest in conversational agents with the rise of large language models (LLMs). Although they offer considerable advantages, LLMs also present significant risks, such as hallucination, which hinder their widespread deployment in industry. Moreover, low-resource languages such as African ones are still underrepresented in these systems limiting their performance in these languages. In this paper, we illustrate a more classical approach based on modular architectures of Task-oriented Dialog Systems (ToDS) offering better control over outputs. We propose a chatbot generation engine based on the Rasa framework and a robust methodology for projecting annotations onto the Wolof language using an in-house machine translation system. After evaluating a generated chatbot trained on the Amazon Massive dataset, our Wolof Intent Classifier performs similarly to the one obtained for French, which is a resource-rich language. We also show that this approach is extensible to other low-resource languages, thanks to the intent classifier’s language-agnostic pipeline, simplifying the design of chatbots in these languages.
2023
MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African languages
Cheikh M. Bamba Dione | David Ifeoluwa Adelani | Peter Nabende | Jesujoba Alabi | Thapelo Sindane | Happy Buzaaba | Shamsuddeen Hassan Muhammad | Chris Chinenye Emezue | Perez Ogayo | Anuoluwapo Aremu | Catherine Gitau | Derguene Mbaye | Jonathan Mukiibi | Blessing Sibanda | Bonaventure F. P. Dossou | Andiswa Bukula | Rooweither Mabuya | Allahsera Auguste Tapo | Edwin Munkoh-Buabeng | Victoire Memdjokam Koagne | Fatoumata Ouoba Kabore | Amelia Taylor | Godson Kalipe | Tebogo Macucwa | Vukosi Marivate | Tajuddeen Gwadabe | Mboning Tchiaze Elvis | Ikechukwu Onyenwe | Gratien Atindogbe | Tolulope Adelani | Idris Akinade | Olanrewaju Samuel | Marien Nahimana | Théogène Musabeyezu | Emile Niyomutabazi | Ester Chimhenga | Kudzai Gotosa | Patrick Mizha | Apelete Agbolo | Seydou Traore | Chinedu Uchechukwu | Aliyu Yusuf | Muhammad Abdullahi | Dietrich Klakow
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Cheikh M. Bamba Dione | David Ifeoluwa Adelani | Peter Nabende | Jesujoba Alabi | Thapelo Sindane | Happy Buzaaba | Shamsuddeen Hassan Muhammad | Chris Chinenye Emezue | Perez Ogayo | Anuoluwapo Aremu | Catherine Gitau | Derguene Mbaye | Jonathan Mukiibi | Blessing Sibanda | Bonaventure F. P. Dossou | Andiswa Bukula | Rooweither Mabuya | Allahsera Auguste Tapo | Edwin Munkoh-Buabeng | Victoire Memdjokam Koagne | Fatoumata Ouoba Kabore | Amelia Taylor | Godson Kalipe | Tebogo Macucwa | Vukosi Marivate | Tajuddeen Gwadabe | Mboning Tchiaze Elvis | Ikechukwu Onyenwe | Gratien Atindogbe | Tolulope Adelani | Idris Akinade | Olanrewaju Samuel | Marien Nahimana | Théogène Musabeyezu | Emile Niyomutabazi | Ester Chimhenga | Kudzai Gotosa | Patrick Mizha | Apelete Agbolo | Seydou Traore | Chinedu Uchechukwu | Aliyu Yusuf | Muhammad Abdullahi | Dietrich Klakow
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
In this paper, we present AfricaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss the challenges in annotating POS for these languages using the universal dependencies (UD) guidelines. We conducted extensive POS baseline experiments using both conditional random field and several multilingual pre-trained language models. We applied various cross-lingual transfer models trained with data available in the UD. Evaluating on the AfricaPOS dataset, we show that choosing the best transfer language(s) in both single-source and multi-source setups greatly improves the POS tagging performance of the target languages, in particular when combined with parameter-fine-tuning methods. Crucially, transferring knowledge from a language that matches the language family and morphosyntactic properties seems to be more effective for POS tagging in unseen languages.
2022
MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition
David Ifeoluwa Adelani | Graham Neubig | Sebastian Ruder | Shruti Rijhwani | Michael Beukman | Chester Palen-Michel | Constantine Lignos | Jesujoba O. Alabi | Shamsuddeen H. Muhammad | Peter Nabende | Cheikh M. Bamba Dione | Andiswa Bukula | Rooweither Mabuya | Bonaventure F. P. Dossou | Blessing Sibanda | Happy Buzaaba | Jonathan Mukiibi | Godson Kalipe | Derguene Mbaye | Amelia Taylor | Fatoumata Kabore | Chris Chinenye Emezue | Anuoluwapo Aremu | Perez Ogayo | Catherine Gitau | Edwin Munkoh-Buabeng | Victoire Memdjokam Koagne | Allahsera Auguste Tapo | Tebogo Macucwa | Vukosi Marivate | Elvis Mboning | Tajuddeen Gwadabe | Tosin Adewumi | Orevaoghene Ahia | Joyce Nakatumba-Nabende | Neo L. Mokono | Ignatius Ezeani | Chiamaka Chukwuneke | Mofetoluwa Adeyemi | Gilles Q. Hacheme | Idris Abdulmumin | Odunayo Ogundepo | Oreen Yousuf | Tatiana Moteu Ngoli | Dietrich Klakow
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
David Ifeoluwa Adelani | Graham Neubig | Sebastian Ruder | Shruti Rijhwani | Michael Beukman | Chester Palen-Michel | Constantine Lignos | Jesujoba O. Alabi | Shamsuddeen H. Muhammad | Peter Nabende | Cheikh M. Bamba Dione | Andiswa Bukula | Rooweither Mabuya | Bonaventure F. P. Dossou | Blessing Sibanda | Happy Buzaaba | Jonathan Mukiibi | Godson Kalipe | Derguene Mbaye | Amelia Taylor | Fatoumata Kabore | Chris Chinenye Emezue | Anuoluwapo Aremu | Perez Ogayo | Catherine Gitau | Edwin Munkoh-Buabeng | Victoire Memdjokam Koagne | Allahsera Auguste Tapo | Tebogo Macucwa | Vukosi Marivate | Elvis Mboning | Tajuddeen Gwadabe | Tosin Adewumi | Orevaoghene Ahia | Joyce Nakatumba-Nabende | Neo L. Mokono | Ignatius Ezeani | Chiamaka Chukwuneke | Mofetoluwa Adeyemi | Gilles Q. Hacheme | Idris Abdulmumin | Odunayo Ogundepo | Oreen Yousuf | Tatiana Moteu Ngoli | Dietrich Klakow
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
African languages are spoken by over a billion people, but they are under-represented in NLP research and development. Multiple challenges exist, including the limited availability of annotated training and evaluation datasets as well as the lack of understanding of which settings, languages, and recently proposed methods like cross-lingual transfer will be effective. In this paper, we aim to move towards solutions for these challenges, focusing on the task of named entity recognition (NER). We present the creation of the largest to-date human-annotated NER dataset for 20 African languages. We study the behaviour of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, empirically demonstrating that the choice of source transfer language significantly affects performance. While much previous work defaults to using English as the source language, our results show that choosing the best transfer language improves zero-shot F1 scores by an average of 14% over 20 languages as compared to using English.
A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for African News Translation
David Ifeoluwa Adelani | Jesujoba Oluwadara Alabi | Angela Fan | Julia Kreutzer | Xiaoyu Shen | Machel Reid | Dana Ruiter | Dietrich Klakow | Peter Nabende | Ernie Chang | Tajuddeen Gwadabe | Freshia Sackey | Bonaventure F. P. Dossou | Chris Emezue | Colin Leong | Michael Beukman | Shamsuddeen H. Muhammad | Guyo D. Jarso | Oreen Yousuf | Andre N. Niyongabo Rubungo | Gilles Hacheme | Eric Peter Wairagala | Muhammad Umair Nasir | Benjamin A. Ajibade | Tunde Oluwaseyi Ajayi | Yvonne Wambui Gitau | Jade Abbott | Mohamed Ahmed | Millicent Ochieng | Anuoluwapo Aremu | Perez Ogayo | Jonathan Mukiibi | Fatoumata Ouoba Kabore | Godson Koffi Kalipe | Derguene Mbaye | Allahsera Auguste Tapo | Victoire M. Memdjokam Koagne | Edwin Munkoh-Buabeng | Valencia Wagner | Idris Abdulmumin | Ayodele Awokoya | Happy Buzaaba | Blessing Sibanda | Andiswa Bukula | Sam Manthalu
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
David Ifeoluwa Adelani | Jesujoba Oluwadara Alabi | Angela Fan | Julia Kreutzer | Xiaoyu Shen | Machel Reid | Dana Ruiter | Dietrich Klakow | Peter Nabende | Ernie Chang | Tajuddeen Gwadabe | Freshia Sackey | Bonaventure F. P. Dossou | Chris Emezue | Colin Leong | Michael Beukman | Shamsuddeen H. Muhammad | Guyo D. Jarso | Oreen Yousuf | Andre N. Niyongabo Rubungo | Gilles Hacheme | Eric Peter Wairagala | Muhammad Umair Nasir | Benjamin A. Ajibade | Tunde Oluwaseyi Ajayi | Yvonne Wambui Gitau | Jade Abbott | Mohamed Ahmed | Millicent Ochieng | Anuoluwapo Aremu | Perez Ogayo | Jonathan Mukiibi | Fatoumata Ouoba Kabore | Godson Koffi Kalipe | Derguene Mbaye | Allahsera Auguste Tapo | Victoire M. Memdjokam Koagne | Edwin Munkoh-Buabeng | Valencia Wagner | Idris Abdulmumin | Ayodele Awokoya | Happy Buzaaba | Blessing Sibanda | Andiswa Bukula | Sam Manthalu
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Recent advances in the pre-training for language models leverage large-scale datasets to create multilingual models. However, low-resource languages are mostly left out in these datasets. This is primarily because many widely spoken languages that are not well represented on the web and therefore excluded from the large-scale crawls for datasets. Furthermore, downstream users of these models are restricted to the selection of languages originally chosen for pre-training. This work investigates how to optimally leverage existing pre-trained models to create low-resource translation systems for 16 African languages. We focus on two questions: 1) How can pre-trained models be used for languages not included in the initial pretraining? and 2) How can the resulting translation models effectively transfer to new domains? To answer these questions, we create a novel African news corpus covering 16 languages, of which eight languages are not part of any existing evaluation dataset. We demonstrate that the most effective strategy for transferring both additional languages and additional domains is to leverage small quantities of high-quality translation data to fine-tune large pre-trained models.
2021
MasakhaNER: Named Entity Recognition for African Languages
David Ifeoluwa Adelani | Jade Abbott | Graham Neubig | Daniel D’souza | Julia Kreutzer | Constantine Lignos | Chester Palen-Michel | Happy Buzaaba | Shruti Rijhwani | Sebastian Ruder | Stephen Mayhew | Israel Abebe Azime | Shamsuddeen H. Muhammad | Chris Chinenye Emezue | Joyce Nakatumba-Nabende | Perez Ogayo | Aremu Anuoluwapo | Catherine Gitau | Derguene Mbaye | Jesujoba Alabi | Seid Muhie Yimam | Tajuddeen Rabiu Gwadabe | Ignatius Ezeani | Rubungo Andre Niyongabo | Jonathan Mukiibi | Verrah Otiende | Iroro Orife | Davis David | Samba Ngom | Tosin Adewumi | Paul Rayson | Mofetoluwa Adeyemi | Gerald Muriuki | Emmanuel Anebi | Chiamaka Chukwuneke | Nkiruka Odu | Eric Peter Wairagala | Samuel Oyerinde | Clemencia Siro | Tobius Saul Bateesa | Temilola Oloyede | Yvonne Wambui | Victor Akinode | Deborah Nabagereka | Maurice Katusiime | Ayodele Awokoya | Mouhamadane MBOUP | Dibora Gebreyohannes | Henok Tilaye | Kelechi Nwaike | Degaga Wolde | Abdoulaye Faye | Blessing Sibanda | Orevaoghene Ahia | Bonaventure F. P. Dossou | Kelechi Ogueji | Thierno Ibrahima DIOP | Abdoulaye Diallo | Adewale Akinfaderin | Tendai Marengereke | Salomey Osei
Transactions of the Association for Computational Linguistics, Volume 9
David Ifeoluwa Adelani | Jade Abbott | Graham Neubig | Daniel D’souza | Julia Kreutzer | Constantine Lignos | Chester Palen-Michel | Happy Buzaaba | Shruti Rijhwani | Sebastian Ruder | Stephen Mayhew | Israel Abebe Azime | Shamsuddeen H. Muhammad | Chris Chinenye Emezue | Joyce Nakatumba-Nabende | Perez Ogayo | Aremu Anuoluwapo | Catherine Gitau | Derguene Mbaye | Jesujoba Alabi | Seid Muhie Yimam | Tajuddeen Rabiu Gwadabe | Ignatius Ezeani | Rubungo Andre Niyongabo | Jonathan Mukiibi | Verrah Otiende | Iroro Orife | Davis David | Samba Ngom | Tosin Adewumi | Paul Rayson | Mofetoluwa Adeyemi | Gerald Muriuki | Emmanuel Anebi | Chiamaka Chukwuneke | Nkiruka Odu | Eric Peter Wairagala | Samuel Oyerinde | Clemencia Siro | Tobius Saul Bateesa | Temilola Oloyede | Yvonne Wambui | Victor Akinode | Deborah Nabagereka | Maurice Katusiime | Ayodele Awokoya | Mouhamadane MBOUP | Dibora Gebreyohannes | Henok Tilaye | Kelechi Nwaike | Degaga Wolde | Abdoulaye Faye | Blessing Sibanda | Orevaoghene Ahia | Bonaventure F. P. Dossou | Kelechi Ogueji | Thierno Ibrahima DIOP | Abdoulaye Diallo | Adewale Akinfaderin | Tendai Marengereke | Salomey Osei
Transactions of the Association for Computational Linguistics, Volume 9
We take a step towards addressing the under- representation of the African continent in NLP research by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten African languages. We detail the characteristics of these languages to help researchers and practitioners better understand the challenges they pose for NER tasks. We analyze our datasets and conduct an extensive empirical evaluation of state- of-the-art methods across both supervised and transfer learning settings. Finally, we release the data, code, and models to inspire future research on African NLP.1
Search
Fix author
Co-authors
- David Ifeoluwa Adelani 4
- Jesujoba Alabi 4
- Happy Buzaaba 4
- Bonaventure F. P. Dossou 4
- Chris Chinenye Emezue 4
- Shamsuddeen Hassan Muhammad 4
- Jonathan Mukiibi 4
- Perez Ogayo 4
- Blessing Kudzaishe Sibanda 4
- Anuoluwapo Aremu 3
- Andiswa Bukula 3
- Catherine Gitau 3
- Tajuddeen Gwadabe 3
- Dietrich Klakow 3
- Edwin Munkoh-Buabeng 3
- Peter Nabende 3
- Allahsera Auguste Tapo 3
- Jade Abbott 2
- Idris Abdulmumin 2
- Tosin Adewumi 2
- Mofetoluwa Adeyemi 2
- Orevaoghene Ahia 2
- Ayodele Awokoya 2
- Michael Beukman 2
- Chiamaka Chukwuneke 2
- Cheikh M. Bamba Dione 2
- Ignatius Ezeani 2
- Godson Kalipe 2
- Julia Kreutzer 2
- Constantine Lignos 2
- Rooweither Mabuya 2
- Tebogo Macucwa 2
- Vukosi Marivate 2
- Victoire Memdjokam Koagne 2
- Joyce Nakatumba-Nabende 2
- Graham Neubig 2
- Fatoumata Ouoba Kabore 2
- Chester Palen-Michel 2
- Shruti Rijhwani 2
- Sebastian Ruder 2
- Amelia Taylor 2
- Eric Peter Wairagala 2
- Oreen Yousuf 2
- Muhammad Abdullahi 1
- Tolulope Adelani 1
- Apelete Agbolo 1
- Mohamed Ahmed 1
- Tunde Oluwaseyi Ajayi 1
- Benjamin A. Ajibade 1
- Idris Akinade 1
- Adewale Akinfaderin 1
- Victor Akinode 1
- Emmanuel Anebi 1
- Aremu Anuoluwapo 1
- Gratien Atindogbe 1
- Israel Abebe Azime 1
- Tobius Saul Bateesa 1
- Ernie Chang 1
- Ester Chimhenga 1
- Thierno Ibrahima DIOP 1
- Davis David 1
- Abdoulaye Diallo 1
- Moussa Diallo 1
- Daniel D’souza 1
- Mboning Tchiaze Elvis 1
- Angela Fan 1
- Abdoulaye Faye 1
- Dibora Gebreyohannes 1
- Yvonne Wambui Gitau 1
- Kudzai Gotosa 1
- Tajuddeen Rabiu Gwadabe 1
- Gilles Q. Hacheme 1
- Gilles Hacheme 1
- Guyo D. Jarso 1
- Fatoumata Kabore 1
- Godson Koffi Kalipe 1
- Maurice Katusiime 1
- Colin Leong 1
- Mouhamadane MBOUP 1
- Sam Manthalu 1
- Tendai Marengereke 1
- Stephen Mayhew 1
- Elvis Mboning 1
- Victoire M. Memdjokam Koagne 1
- Patrick Mizha 1
- Neo L. Mokono 1
- Tatiana Moteu Ngoli 1
- Gerald Muriuki 1
- Théogène Musabeyezu 1
- Deborah Nabagereka 1
- Marien Nahimana 1
- Muhammad Umair Nasir 1
- Samba Ngom 1
- Emile Niyomutabazi 1
- Rubungo Andre Niyongabo 1
- Andre N. Niyongabo Rubungo 1
- Kelechi Nwaike 1
- Millicent Ochieng 1
- Nkiruka Odu 1
- Kelechi Ogueji 1
- Odunayo Ogundepo 1
- Temilola Oloyede 1
- Ikechukwu Onyenwe 1
- Iroro Orife 1
- Salomey Osei 1
- Verrah Otiende 1
- Samuel Oyerinde 1
- Paul Rayson 1
- Machel Reid 1
- Dana Ruiter 1
- Freshia Sackey 1
- Olanrewaju Samuel 1
- Xiaoyu Shen 1
- Thapelo Sindane 1
- Clemencia Siro 1
- Henok Tilaye 1
- Seydou Traore 1
- Chinedu Uchechukwu 1
- Valencia Wagner 1
- Yvonne Wambui 1
- Degaga Wolde 1
- Seid Muhie Yimam 1
- Aliyu Yusuf 1