Adewale Akinfaderin


2021

pdf
MasakhaNER: Named Entity Recognition for African Languages
David Ifeoluwa Adelani | Jade Abbott | Graham Neubig | Daniel D’souza | Julia Kreutzer | Constantine Lignos | Chester Palen-Michel | Happy Buzaaba | Shruti Rijhwani | Sebastian Ruder | Stephen Mayhew | Israel Abebe Azime | Shamsuddeen H. Muhammad | Chris Chinenye Emezue | Joyce Nakatumba-Nabende | Perez Ogayo | Aremu Anuoluwapo | Catherine Gitau | Derguene Mbaye | Jesujoba Alabi | Seid Muhie Yimam | Tajuddeen Rabiu Gwadabe | Ignatius Ezeani | Rubungo Andre Niyongabo | Jonathan Mukiibi | Verrah Otiende | Iroro Orife | Davis David | Samba Ngom | Tosin Adewumi | Paul Rayson | Mofetoluwa Adeyemi | Gerald Muriuki | Emmanuel Anebi | Chiamaka Chukwuneke | Nkiruka Odu | Eric Peter Wairagala | Samuel Oyerinde | Clemencia Siro | Tobius Saul Bateesa | Temilola Oloyede | Yvonne Wambui | Victor Akinode | Deborah Nabagereka | Maurice Katusiime | Ayodele Awokoya | Mouhamadane MBOUP | Dibora Gebreyohannes | Henok Tilaye | Kelechi Nwaike | Degaga Wolde | Abdoulaye Faye | Blessing Sibanda | Orevaoghene Ahia | Bonaventure F. P. Dossou | Kelechi Ogueji | Thierno Ibrahima DIOP | Abdoulaye Diallo | Adewale Akinfaderin | Tendai Marengereke | Salomey Osei
Transactions of the Association for Computational Linguistics, Volume 9

We take a step towards addressing the under- representation of the African continent in NLP research by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten African languages. We detail the characteristics of these languages to help researchers and practitioners better understand the challenges they pose for NER tasks. We analyze our datasets and conduct an extensive empirical evaluation of state- of-the-art methods across both supervised and transfer learning settings. Finally, we release the data, code, and models to inspire future research on African NLP.1

2020


Predicting and Analyzing Law-Making in Kenya
Oyinlola Babafemi | Adewale Akinfaderin
Proceedings of the The Fourth Widening Natural Language Processing Workshop

Modelling and analyzing parliamentary legislation, roll-call votes and order of proceedings in developed countries has received significant attention in recent years. In this paper, we focused on understanding the bills introduced in a developing democracy, the Kenyan bicameral parliament. We developed and trained machine learning models on a combination of features extracted from the bills to predict the outcome - if a bill will be enacted or not. We observed that the texts in a bill are not as relevant as the year and month the bill was introduced and the category the bill belongs to.


HausaMT v1.0: Towards English–Hausa Neural Machine Translation
Adewale Akinfaderin
Proceedings of the The Fourth Widening Natural Language Processing Workshop

Neural Machine Translation (NMT) for low-resource languages suffers from low performance because of the lack of large amounts of parallel data and language diversity. To contribute to ameliorating this problem, we built a baseline model for English–Hausa machine translation, which is considered a task for low–resource language. The Hausa language is the second largest Afro–Asiatic language in the world after Arabic and it is the third largest language for trading across a larger swath of West Africa countries, after English and French. In this paper, we curated different datasets containing Hausa–English parallel corpus for our translation. We trained baseline models and evaluated the performance of our models using the Recurrent and Transformer encoder–decoder architecture with two tokenization approaches: standard word–level tokenization and Byte Pair Encoding (BPE) subword tokenization.

pdf
Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages
Wilhelmina Nekoto | Vukosi Marivate | Tshinondiwa Matsila | Timi Fasubaa | Taiwo Fagbohungbe | Solomon Oluwole Akinola | Shamsuddeen Muhammad | Salomon Kabongo Kabenamualu | Salomey Osei | Freshia Sackey | Rubungo Andre Niyongabo | Ricky Macharm | Perez Ogayo | Orevaoghene Ahia | Musie Meressa Berhe | Mofetoluwa Adeyemi | Masabata Mokgesi-Selinga | Lawrence Okegbemi | Laura Martinus | Kolawole Tajudeen | Kevin Degila | Kelechi Ogueji | Kathleen Siminyu | Julia Kreutzer | Jason Webster | Jamiil Toure Ali | Jade Abbott | Iroro Orife | Ignatius Ezeani | Idris Abdulkadir Dangana | Herman Kamper | Hady Elsahar | Goodness Duru | Ghollah Kioko | Murhabazi Espoir | Elan van Biljon | Daniel Whitenack | Christopher Onyefuluchi | Chris Chinenye Emezue | Bonaventure F. P. Dossou | Blessing Sibanda | Blessing Bassey | Ayodele Olabiyi | Arshath Ramkilowan | Alp Öktem | Adewale Akinfaderin | Abdallah Bashir
Findings of the Association for Computational Linguistics: EMNLP 2020

Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. ‘Low-resourced’-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communication worldwide. Despite immense improvements in MT over the past decade, MT is centered around a few high-resourced languages. As MT researchers cannot solve the problem of low-resourcedness alone, we propose participatory research as a means to involve all necessary agents required in the MT development process. We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages. Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution. Benchmarks, models, data, code, and evaluation results are released at https://github.com/masakhane-io/masakhane-mt.
Search
Co-authors