Harrison Santiago


2022

pdf
Disambiguation of morpho-syntactic features of African American English – the case of habitual be
Harrison Santiago | Joshua Martin | Sarah Moeller | Kevin Tang
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

Recent research has highlighted that natural language processing (NLP) systems exhibit a bias againstAfrican American speakers. These errors are often caused by poor representation of linguistic features unique to African American English (AAE), which is due to the relatively low probability of occurrence for many such features. We present a workflow to overcome this issue in the case of habitual “be”. Habitual “be” is isomorphic, and therefore ambiguous, with other forms of uninflected “be” found in both AAE and General American English (GAE). This creates a clear challenge for bias in NLP technologies. To overcome the scarcity, we employ a combination of rule-based filters and data augmentation that generate a corpus balanced between habitual and non-habitual instances. This balanced corpus trains unbiased machine learning classifiers, as demonstrated on a corpus of AAE transcribed texts, achieving .65 F1 score at classifying habitual “be”.