Disambiguation of morpho-syntactic features of African American English – the case of habitual be

Harrison Santiago; Joshua Martin; Sarah Moeller; Kevin Tang

doi:10.18653/v1/2022.ltedi-1.9

Disambiguation of morpho-syntactic features of African American English – the case of habitual be

Harrison Santiago, Joshua Martin, Sarah Moeller, Kevin Tang

Abstract

Recent research has highlighted that natural language processing (NLP) systems exhibit a bias againstAfrican American speakers. These errors are often caused by poor representation of linguistic features unique to African American English (AAE), which is due to the relatively low probability of occurrence for many such features. We present a workflow to overcome this issue in the case of habitual “be”. Habitual “be” is isomorphic, and therefore ambiguous, with other forms of uninflected “be” found in both AAE and General American English (GAE). This creates a clear challenge for bias in NLP technologies. To overcome the scarcity, we employ a combination of rule-based filters and data augmentation that generate a corpus balanced between habitual and non-habitual instances. This balanced corpus trains unbiased machine learning classifiers, as demonstrated on a corpus of AAE transcribed texts, achieving .65 F₁ score at classifying habitual “be”.

Anthology ID:: 2022.ltedi-1.9
Volume:: Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Bharathi Raja Chakravarthi, B Bharathi, John P McCrae, Manel Zarrouk, Kalika Bali, Paul Buitelaar
Venue:: LTEDI
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 70–75
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2022.ltedi-1.9/
DOI:: 10.18653/v1/2022.ltedi-1.9
Bibkey:
Cite (ACL):: Harrison Santiago, Joshua Martin, Sarah Moeller, and Kevin Tang. 2022. Disambiguation of morpho-syntactic features of African American English – the case of habitual be. In Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion, pages 70–75, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: Disambiguation of morpho-syntactic features of African American English – the case of habitual be (Santiago et al., LTEDI 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2022.ltedi-1.9.pdf
Video:: https://preview.aclanthology.org/ingest-emnlp/2022.ltedi-1.9.mp4

PDF Cite Search Video Fix data