Jade Abbott
2025
The Esethu Framework: Reimagining Sustainable Dataset Governance and Curation for Low-Resource Languages
Jenalea Rajab | Anuoluwapo Aremu | Everlyn Asiko Chimoto | Dale Dunbar | Graham Morrissey | Fadel Thior | Luandrie Potgieter | Jessica Ojo | Atnafu Lambebo Tonja | Wilhelmina NdapewaOnyothi Nekoto | Pelonomi Moiloa | Jade Abbott | Vukosi Marivate | Benjamin Rosman
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jenalea Rajab | Anuoluwapo Aremu | Everlyn Asiko Chimoto | Dale Dunbar | Graham Morrissey | Fadel Thior | Luandrie Potgieter | Jessica Ojo | Atnafu Lambebo Tonja | Wilhelmina NdapewaOnyothi Nekoto | Pelonomi Moiloa | Jade Abbott | Vukosi Marivate | Benjamin Rosman
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
This paper presents the Esethu Framework, a sustainable data curation framework specifically designed to empower local communities and ensure equitable benefit-sharing from their linguistic resource. This framework is supported by the Esethu license, a novel community-centric data license. As a proof of concept, we introduce the Vuk’uzenzele isiXhosa Speech Dataset (ViXSD), an open-source corpus developed under the Esethu Framework and License. The dataset, containing read speech from native isiXhosa speakers enriched with demographic and linguistic metadata, demonstrates how community-driven licensing and curation principles can bridge resource gaps in automatic speech recognition (ASR) for African languages while safeguarding the interests of data creators. We describe the framework guiding dataset development, outline the Esethu license provisions, present the methodology for ViXSD, and present ASR experiments validating ViXSD’s usability in building and refining voice-driven applications for isiXhosa.
2024
Proceedings of the Seventh Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2024)
Atul Kr. Ojha | Chao-hong Liu | Ekaterina Vylomova | Flammie Pirinen | Jade Abbott | Jonathan Washington | Nathaniel Oco | Valentin Malykh | Varvara Logacheva | Xiaobing Zhao
Proceedings of the Seventh Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2024)
Atul Kr. Ojha | Chao-hong Liu | Ekaterina Vylomova | Flammie Pirinen | Jade Abbott | Jonathan Washington | Nathaniel Oco | Valentin Malykh | Varvara Logacheva | Xiaobing Zhao
Proceedings of the Seventh Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2024)
2023
Proceedings of the Sixth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2023)
Atul Kr. Ojha | Chao-hong Liu | Ekaterina Vylomova | Flammie Pirinen | Jade Abbott | Jonathan Washington | Nathaniel Oco | Valentin Malykh | Varvara Logacheva | Xiaobing Zhao
Proceedings of the Sixth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2023)
Atul Kr. Ojha | Chao-hong Liu | Ekaterina Vylomova | Flammie Pirinen | Jade Abbott | Jonathan Washington | Nathaniel Oco | Valentin Malykh | Varvara Logacheva | Xiaobing Zhao
Proceedings of the Sixth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2023)
Unsupervised Cross-lingual Word Embedding Representation for English-isiZulu
Derwin Ngomane | Rooweither Mabuya | Jade Abbott | Vukosi Marivate
Proceedings of the Fourth workshop on Resources for African Indigenous Languages (RAIL 2023)
Derwin Ngomane | Rooweither Mabuya | Jade Abbott | Vukosi Marivate
Proceedings of the Fourth workshop on Resources for African Indigenous Languages (RAIL 2023)
In this study, we investigate the effectiveness of using cross-lingual word embeddings for zero-shot transfer learning between a language with an abundant resource, English, and a languagewith limited resource, isiZulu. IsiZulu is a part of the South African Nguni language family, which is characterised by complex agglutinating morphology. We use VecMap, an open source tool, to obtain cross-lingual word embeddings. To perform an extrinsic evaluation of the effectiveness of the embeddings, we train a news classifier on labelled English data in order to categorise unlabelled isiZulu data using zero-shot transfer learning. In our study, we found our model to have a weighted average F1-score of 0.34. Our findings demonstrate that VecMap generates modular word embeddings in the cross-lingual space that have an impact on the downstream classifier used for zero-shot transfer learning.
2022
Proceedings of the Fifth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2022)
Atul Kr. Ojha | Chao-Hong Liu | Ekaterina Vylomova | Jade Abbott | Jonathan Washington | Nathaniel Oco | Tommi A Pirinen | Valentin Malykh | Varvara Logacheva | Xiaobing Zhao
Proceedings of the Fifth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2022)
Atul Kr. Ojha | Chao-Hong Liu | Ekaterina Vylomova | Jade Abbott | Jonathan Washington | Nathaniel Oco | Tommi A Pirinen | Valentin Malykh | Varvara Logacheva | Xiaobing Zhao
Proceedings of the Fifth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2022)
A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for African News Translation
David Ifeoluwa Adelani | Jesujoba Oluwadara Alabi | Angela Fan | Julia Kreutzer | Xiaoyu Shen | Machel Reid | Dana Ruiter | Dietrich Klakow | Peter Nabende | Ernie Chang | Tajuddeen Gwadabe | Freshia Sackey | Bonaventure F. P. Dossou | Chris Emezue | Colin Leong | Michael Beukman | Shamsuddeen H. Muhammad | Guyo D. Jarso | Oreen Yousuf | Andre N. Niyongabo Rubungo | Gilles Hacheme | Eric Peter Wairagala | Muhammad Umair Nasir | Benjamin A. Ajibade | Tunde Oluwaseyi Ajayi | Yvonne Wambui Gitau | Jade Abbott | Mohamed Ahmed | Millicent Ochieng | Anuoluwapo Aremu | Perez Ogayo | Jonathan Mukiibi | Fatoumata Ouoba Kabore | Godson Koffi Kalipe | Derguene Mbaye | Allahsera Auguste Tapo | Victoire M. Memdjokam Koagne | Edwin Munkoh-Buabeng | Valencia Wagner | Idris Abdulmumin | Ayodele Awokoya | Happy Buzaaba | Blessing Sibanda | Andiswa Bukula | Sam Manthalu
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
David Ifeoluwa Adelani | Jesujoba Oluwadara Alabi | Angela Fan | Julia Kreutzer | Xiaoyu Shen | Machel Reid | Dana Ruiter | Dietrich Klakow | Peter Nabende | Ernie Chang | Tajuddeen Gwadabe | Freshia Sackey | Bonaventure F. P. Dossou | Chris Emezue | Colin Leong | Michael Beukman | Shamsuddeen H. Muhammad | Guyo D. Jarso | Oreen Yousuf | Andre N. Niyongabo Rubungo | Gilles Hacheme | Eric Peter Wairagala | Muhammad Umair Nasir | Benjamin A. Ajibade | Tunde Oluwaseyi Ajayi | Yvonne Wambui Gitau | Jade Abbott | Mohamed Ahmed | Millicent Ochieng | Anuoluwapo Aremu | Perez Ogayo | Jonathan Mukiibi | Fatoumata Ouoba Kabore | Godson Koffi Kalipe | Derguene Mbaye | Allahsera Auguste Tapo | Victoire M. Memdjokam Koagne | Edwin Munkoh-Buabeng | Valencia Wagner | Idris Abdulmumin | Ayodele Awokoya | Happy Buzaaba | Blessing Sibanda | Andiswa Bukula | Sam Manthalu
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Recent advances in the pre-training for language models leverage large-scale datasets to create multilingual models. However, low-resource languages are mostly left out in these datasets. This is primarily because many widely spoken languages that are not well represented on the web and therefore excluded from the large-scale crawls for datasets. Furthermore, downstream users of these models are restricted to the selection of languages originally chosen for pre-training. This work investigates how to optimally leverage existing pre-trained models to create low-resource translation systems for 16 African languages. We focus on two questions: 1) How can pre-trained models be used for languages not included in the initial pretraining? and 2) How can the resulting translation models effectively transfer to new domains? To answer these questions, we create a novel African news corpus covering 16 languages, of which eight languages are not part of any existing evaluation dataset. We demonstrate that the most effective strategy for transferring both additional languages and additional domains is to leverage small quantities of high-quality translation data to fine-tune large pre-trained models.
2021
MasakhaNER: Named Entity Recognition for African Languages
David Ifeoluwa Adelani | Jade Abbott | Graham Neubig | Daniel D’souza | Julia Kreutzer | Constantine Lignos | Chester Palen-Michel | Happy Buzaaba | Shruti Rijhwani | Sebastian Ruder | Stephen Mayhew | Israel Abebe Azime | Shamsuddeen H. Muhammad | Chris Chinenye Emezue | Joyce Nakatumba-Nabende | Perez Ogayo | Aremu Anuoluwapo | Catherine Gitau | Derguene Mbaye | Jesujoba Alabi | Seid Muhie Yimam | Tajuddeen Rabiu Gwadabe | Ignatius Ezeani | Rubungo Andre Niyongabo | Jonathan Mukiibi | Verrah Otiende | Iroro Orife | Davis David | Samba Ngom | Tosin Adewumi | Paul Rayson | Mofetoluwa Adeyemi | Gerald Muriuki | Emmanuel Anebi | Chiamaka Chukwuneke | Nkiruka Odu | Eric Peter Wairagala | Samuel Oyerinde | Clemencia Siro | Tobius Saul Bateesa | Temilola Oloyede | Yvonne Wambui | Victor Akinode | Deborah Nabagereka | Maurice Katusiime | Ayodele Awokoya | Mouhamadane MBOUP | Dibora Gebreyohannes | Henok Tilaye | Kelechi Nwaike | Degaga Wolde | Abdoulaye Faye | Blessing Sibanda | Orevaoghene Ahia | Bonaventure F. P. Dossou | Kelechi Ogueji | Thierno Ibrahima DIOP | Abdoulaye Diallo | Adewale Akinfaderin | Tendai Marengereke | Salomey Osei
Transactions of the Association for Computational Linguistics, Volume 9
David Ifeoluwa Adelani | Jade Abbott | Graham Neubig | Daniel D’souza | Julia Kreutzer | Constantine Lignos | Chester Palen-Michel | Happy Buzaaba | Shruti Rijhwani | Sebastian Ruder | Stephen Mayhew | Israel Abebe Azime | Shamsuddeen H. Muhammad | Chris Chinenye Emezue | Joyce Nakatumba-Nabende | Perez Ogayo | Aremu Anuoluwapo | Catherine Gitau | Derguene Mbaye | Jesujoba Alabi | Seid Muhie Yimam | Tajuddeen Rabiu Gwadabe | Ignatius Ezeani | Rubungo Andre Niyongabo | Jonathan Mukiibi | Verrah Otiende | Iroro Orife | Davis David | Samba Ngom | Tosin Adewumi | Paul Rayson | Mofetoluwa Adeyemi | Gerald Muriuki | Emmanuel Anebi | Chiamaka Chukwuneke | Nkiruka Odu | Eric Peter Wairagala | Samuel Oyerinde | Clemencia Siro | Tobius Saul Bateesa | Temilola Oloyede | Yvonne Wambui | Victor Akinode | Deborah Nabagereka | Maurice Katusiime | Ayodele Awokoya | Mouhamadane MBOUP | Dibora Gebreyohannes | Henok Tilaye | Kelechi Nwaike | Degaga Wolde | Abdoulaye Faye | Blessing Sibanda | Orevaoghene Ahia | Bonaventure F. P. Dossou | Kelechi Ogueji | Thierno Ibrahima DIOP | Abdoulaye Diallo | Adewale Akinfaderin | Tendai Marengereke | Salomey Osei
Transactions of the Association for Computational Linguistics, Volume 9
We take a step towards addressing the under- representation of the African continent in NLP research by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten African languages. We detail the characteristics of these languages to help researchers and practitioners better understand the challenges they pose for NER tasks. We analyze our datasets and conduct an extensive empirical evaluation of state- of-the-art methods across both supervised and transfer learning settings. Finally, we release the data, code, and models to inspire future research on African NLP.1
2020
Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages
Wilhelmina Nekoto | Vukosi Marivate | Tshinondiwa Matsila | Timi Fasubaa | Taiwo Fagbohungbe | Solomon Oluwole Akinola | Shamsuddeen Muhammad | Salomon Kabongo Kabenamualu | Salomey Osei | Freshia Sackey | Rubungo Andre Niyongabo | Ricky Macharm | Perez Ogayo | Orevaoghene Ahia | Musie Meressa Berhe | Mofetoluwa Adeyemi | Masabata Mokgesi-Selinga | Lawrence Okegbemi | Laura Martinus | Kolawole Tajudeen | Kevin Degila | Kelechi Ogueji | Kathleen Siminyu | Julia Kreutzer | Jason Webster | Jamiil Toure Ali | Jade Abbott | Iroro Orife | Ignatius Ezeani | Idris Abdulkadir Dangana | Herman Kamper | Hady Elsahar | Goodness Duru | Ghollah Kioko | Murhabazi Espoir | Elan van Biljon | Daniel Whitenack | Christopher Onyefuluchi | Chris Chinenye Emezue | Bonaventure F. P. Dossou | Blessing Sibanda | Blessing Bassey | Ayodele Olabiyi | Arshath Ramkilowan | Alp Öktem | Adewale Akinfaderin | Abdallah Bashir
Findings of the Association for Computational Linguistics: EMNLP 2020
Wilhelmina Nekoto | Vukosi Marivate | Tshinondiwa Matsila | Timi Fasubaa | Taiwo Fagbohungbe | Solomon Oluwole Akinola | Shamsuddeen Muhammad | Salomon Kabongo Kabenamualu | Salomey Osei | Freshia Sackey | Rubungo Andre Niyongabo | Ricky Macharm | Perez Ogayo | Orevaoghene Ahia | Musie Meressa Berhe | Mofetoluwa Adeyemi | Masabata Mokgesi-Selinga | Lawrence Okegbemi | Laura Martinus | Kolawole Tajudeen | Kevin Degila | Kelechi Ogueji | Kathleen Siminyu | Julia Kreutzer | Jason Webster | Jamiil Toure Ali | Jade Abbott | Iroro Orife | Ignatius Ezeani | Idris Abdulkadir Dangana | Herman Kamper | Hady Elsahar | Goodness Duru | Ghollah Kioko | Murhabazi Espoir | Elan van Biljon | Daniel Whitenack | Christopher Onyefuluchi | Chris Chinenye Emezue | Bonaventure F. P. Dossou | Blessing Sibanda | Blessing Bassey | Ayodele Olabiyi | Arshath Ramkilowan | Alp Öktem | Adewale Akinfaderin | Abdallah Bashir
Findings of the Association for Computational Linguistics: EMNLP 2020
Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. ‘Low-resourced’-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communication worldwide. Despite immense improvements in MT over the past decade, MT is centered around a few high-resourced languages. As MT researchers cannot solve the problem of low-resourcedness alone, we propose participatory research as a means to involve all necessary agents required in the MT development process. We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages. Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution. Benchmarks, models, data, code, and evaluation results are released at https://github.com/masakhane-io/masakhane-mt.
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages
Alina Karakanta | Atul Kr. Ojha | Chao-Hong Liu | Jade Abbott | John Ortega | Jonathan Washington | Nathaniel Oco | Surafel Melaku Lakew | Tommi A Pirinen | Valentin Malykh | Varvara Logacheva | Xiaobing Zhao
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages
Alina Karakanta | Atul Kr. Ojha | Chao-Hong Liu | Jade Abbott | John Ortega | Jonathan Washington | Nathaniel Oco | Surafel Melaku Lakew | Tommi A Pirinen | Valentin Malykh | Varvara Logacheva | Xiaobing Zhao
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages
2019
Benchmarking Neural Machine Translation for Southern African Languages
Jade Abbott | Laura Martinus
Proceedings of the 2019 Workshop on Widening NLP
Jade Abbott | Laura Martinus
Proceedings of the 2019 Workshop on Widening NLP
Unlike major Western languages, most African languages are very low-resourced. Furthermore, the resources that do exist are often scattered and difficult to obtain and discover. As a result, the data and code for existing research has rarely been shared, meaning researchers struggle to reproduce reported results, and almost no publicly available benchmarks or leaderboards for African machine translation models exist. To start to address these problems, we trained neural machine translation models for a subset of Southern African languages on publicly-available datasets. We provide the code for training the models and evaluate the models on a newly released evaluation set, with the aim of starting a leaderboard for Southern African languages and spur future research in the field.
Search
Fix author
Co-authors
- Chao-Hong Liu 4
- Varvara Logacheva 4
- Valentin Malykh 4
- Nathaniel Oco 4
- Atul Kr. Ojha 4
- Flammie A. Pirinen 4
- Jonathan Washington 4
- Xiaobing Zhao 4
- Bonaventure F. P. Dossou 3
- Chris Chinenye Emezue 3
- Julia Kreutzer 3
- Vukosi Marivate 3
- Shamsuddeen Hassan Muhammad 3
- Perez Ogayo 3
- Blessing Kudzaishe Sibanda 3
- Ekaterina Vylomova 3
- David Ifeoluwa Adelani 2
- Mofetoluwa Adeyemi 2
- Orevaoghene Ahia 2
- Adewale Akinfaderin 2
- Jesujoba Alabi 2
- Anuoluwapo Aremu 2
- Ayodele Awokoya 2
- Happy Buzaaba 2
- Ignatius Ezeani 2
- Laura Martinus 2
- Derguene Mbaye 2
- Jonathan Mukiibi 2
- Rubungo Andre Niyongabo 2
- Kelechi Ogueji 2
- Iroro Orife 2
- Salomey Osei 2
- Freshia Sackey 2
- Eric Peter Wairagala 2
- Idris Abdulmumin 1
- Tosin Adewumi 1
- Mohamed Ahmed 1
- Tunde Oluwaseyi Ajayi 1
- Benjamin A. Ajibade 1
- Victor Akinode 1
- Solomon Oluwole Akinola 1
- Jamiil Toure Ali 1
- Emmanuel Anebi 1
- Aremu Anuoluwapo 1
- Israel Abebe Azime 1
- Abdallah Bashir 1
- Blessing Bassey 1
- Tobius Saul Bateesa 1
- Musie Meressa Berhe 1
- Michael Beukman 1
- Andiswa Bukula 1
- Ernie Chang 1
- Everlyn Asiko Chimoto 1
- Chiamaka Chukwuneke 1
- Thierno Ibrahima DIOP 1
- Idris Abdulkadir Dangana 1
- Davis David 1
- Kevin Degila 1
- Abdoulaye Diallo 1
- Dale Dunbar 1
- Goodness Duru 1
- Daniel D’souza 1
- Hady Elsahar 1
- Murhabazi Espoir 1
- Taiwo Fagbohungbe 1
- Angela Fan 1
- Timi Fasubaa 1
- Abdoulaye Faye 1
- Dibora Gebreyohannes 1
- Catherine Gitau 1
- Yvonne Wambui Gitau 1
- Tajuddeen Rabiu Gwadabe 1
- Tajuddeen Gwadabe 1
- Gilles Hacheme 1
- Guyo D. Jarso 1
- Salomon Kabongo Kabenamualu 1
- Godson Koffi Kalipe 1
- Herman Kamper 1
- Alina Karakanta 1
- Maurice Katusiime 1
- Ghollah Kioko 1
- Dietrich Klakow 1
- Surafel Melaku Lakew 1
- Colin Leong 1
- Constantine Lignos 1
- Mouhamadane MBOUP 1
- Rooweither Mabuya 1
- Ricky Macharm 1
- Sam Manthalu 1
- Tendai Marengereke 1
- Tshinondiwa Matsila 1
- Stephen Mayhew 1
- Victoire M. Memdjokam Koagne 1
- Pelonomi Moiloa 1
- Masabata Mokgesi-Selinga 1
- Graham Morrissey 1
- Edwin Munkoh-Buabeng 1
- Gerald Muriuki 1
- Deborah Nabagereka 1
- Peter Nabende 1
- Joyce Nakatumba-Nabende 1
- Muhammad Umair Nasir 1
- Wilhelmina Nekoto 1
- Wilhelmina NdapewaOnyothi Nekoto 1
- Graham Neubig 1
- Samba Ngom 1
- Derwin Ngomane 1
- Andre N. Niyongabo Rubungo 1
- Kelechi Nwaike 1
- Millicent Ochieng 1
- Nkiruka Odu 1
- Jessica Ojo 1
- Lawrence Okegbemi 1
- Ayodele Olabiyi 1
- Temilola Oloyede 1
- Christopher Onyefuluchi 1
- John Ortega 1
- Verrah Otiende 1
- Fatoumata Ouoba Kabore 1
- Samuel Oyerinde 1
- Chester Palen-Michel 1
- Luandrie Potgieter 1
- Jenalea Rajab 1
- Arshath Ramkilowan 1
- Paul Rayson 1
- Machel Reid 1
- Shruti Rijhwani 1
- Benjamin Rosman 1
- Sebastian Ruder 1
- Dana Ruiter 1
- Xiaoyu Shen 1
- Kathleen Siminyu 1
- Clemencia Siro 1
- Kolawole Tajudeen 1
- Allahsera Auguste Tapo 1
- Fadel Thior 1
- Henok Tilaye 1
- Atnafu Lambebo Tonja 1
- Valencia Wagner 1
- Yvonne Wambui 1
- Jason Webster 1
- Daniel Whitenack 1
- Degaga Wolde 1
- Seid Muhie Yimam 1
- Oreen Yousuf 1
- Elan van Biljon 1
- Alp Öktem 1