Amelia Taylor - ACL Anthology

This is an internal, incomplete preview of a proposed change to the ACL Anthology. For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes. Do not treat this content as an official publication.

Amelia Taylor

2023

MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African languages
Cheikh M. Bamba Dione | David Ifeoluwa Adelani | Peter Nabende | Jesujoba Alabi | Thapelo Sindane | Happy Buzaaba | Shamsuddeen Hassan Muhammad | Chris Chinenye Emezue | Perez Ogayo | Anuoluwapo Aremu | Catherine Gitau | Derguene Mbaye | Jonathan Mukiibi | Blessing Sibanda | Bonaventure F. P. Dossou | Andiswa Bukula | Rooweither Mabuya | Allahsera Auguste Tapo | Edwin Munkoh-Buabeng | Victoire Memdjokam Koagne | Fatoumata Ouoba Kabore | Amelia Taylor | Godson Kalipe | Tebogo Macucwa | Vukosi Marivate | Tajuddeen Gwadabe | Mboning Tchiaze Elvis | Ikechukwu Onyenwe | Gratien Atindogbe | Tolulope Adelani | Idris Akinade | Olanrewaju Samuel | Marien Nahimana | Théogène Musabeyezu | Emile Niyomutabazi | Ester Chimhenga | Kudzai Gotosa | Patrick Mizha | Apelete Agbolo | Seydou Traore | Chinedu Uchechukwu | Aliyu Yusuf | Muhammad Abdullahi | Dietrich Klakow
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper, we present AfricaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss the challenges in annotating POS for these languages using the universal dependencies (UD) guidelines. We conducted extensive POS baseline experiments using both conditional random field and several multilingual pre-trained language models. We applied various cross-lingual transfer models trained with data available in the UD. Evaluating on the AfricaPOS dataset, we show that choosing the best transfer language(s) in both single-source and multi-source setups greatly improves the POS tagging performance of the target languages, in particular when combined with parameter-fine-tuning methods. Crucially, transferring knowledge from a language that matches the language family and morphosyntactic properties seems to be more effective for POS tagging in unseen languages.

2022

MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition
David Ifeoluwa Adelani | Graham Neubig | Sebastian Ruder | Shruti Rijhwani | Michael Beukman | Chester Palen-Michel | Constantine Lignos | Jesujoba O. Alabi | Shamsuddeen H. Muhammad | Peter Nabende | Cheikh M. Bamba Dione | Andiswa Bukula | Rooweither Mabuya | Bonaventure F. P. Dossou | Blessing Sibanda | Happy Buzaaba | Jonathan Mukiibi | Godson Kalipe | Derguene Mbaye | Amelia Taylor | Fatoumata Kabore | Chris Chinenye Emezue | Anuoluwapo Aremu | Perez Ogayo | Catherine Gitau | Edwin Munkoh-Buabeng | Victoire Memdjokam Koagne | Allahsera Auguste Tapo | Tebogo Macucwa | Vukosi Marivate | Elvis Mboning | Tajuddeen Gwadabe | Tosin Adewumi | Orevaoghene Ahia | Joyce Nakatumba-Nabende | Neo L. Mokono | Ignatius Ezeani | Chiamaka Chukwuneke | Mofetoluwa Adeyemi | Gilles Q. Hacheme | Idris Abdulmumin | Odunayo Ogundepo | Oreen Yousuf | Tatiana Moteu Ngoli | Dietrich Klakow
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

African languages are spoken by over a billion people, but they are under-represented in NLP research and development. Multiple challenges exist, including the limited availability of annotated training and evaluation datasets as well as the lack of understanding of which settings, languages, and recently proposed methods like cross-lingual transfer will be effective. In this paper, we aim to move towards solutions for these challenges, focusing on the task of named entity recognition (NER). We present the creation of the largest to-date human-annotated NER dataset for 20 African languages. We study the behaviour of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, empirically demonstrating that the choice of source transfer language significantly affects performance. While much previous work defaults to using English as the source language, our results show that choosing the best transfer language improves zero-shot F1 scores by an average of 14% over 20 languages as compared to using English.

Co-authors

Cheikh M. Bamba Dione 2

Bonaventure F. P. Dossou 2

Chris Chinenye Emezue 2

Catherine Gitau 2

Tajuddeen Gwadabe 2

Godson Kalipe 2

Dietrich Klakow 2

Rooweither Mabuya 2

Tebogo Macucwa 2

Vukosi Marivate 2

Derguene Mbaye 2

Victoire Memdjokam Koagne 2

Shamsuddeen Hassan Muhammad 2

Jonathan Mukiibi 2

Edwin Munkoh-Buabeng 2

Peter Nabende 2

Blessing Kudzaishe Sibanda 2

Allahsera Auguste Tapo 2

Muhammad Abdullahi 1

Idris Abdulmumin 1

Tolulope Adelani 1

Tosin Adewumi 1

Mofetoluwa Adeyemi 1

Apelete Agbolo 1

Orevaoghene Ahia 1

Idris Akinade 1

Gratien Atindogbe 1

Michael Beukman 1

Ester Chimhenga 1

Chiamaka Chukwuneke 1

Mboning Tchiaze Elvis 1

Ignatius Ezeani 1

Kudzai Gotosa 1

Gilles Q. Hacheme 1

Fatoumata Kabore 1

Constantine Lignos 1

Elvis Mboning 1

Patrick Mizha 1

Neo L. Mokono 1

Tatiana Moteu Ngoli 1

Théogène Musabeyezu 1

Marien Nahimana 1

Joyce Nakatumba-Nabende 1

Graham Neubig 1

Emile Niyomutabazi 1

Odunayo Ogundepo 1

Ikechukwu Onyenwe 1

Fatoumata Ouoba Kabore 1

Chester Palen-Michel 1

Shruti Rijhwani 1

Sebastian Ruder 1

Olanrewaju Samuel 1

Thapelo Sindane 1

Seydou Traore 1

Chinedu Uchechukwu 1

Venues

acl1
emnlp1