@inproceedings{santiago-etal-2022-disambiguation,
    title = "Disambiguation of morpho-syntactic features of {A}frican {A}merican {E}nglish {--} the case of habitual be",
    author = "Santiago, Harrison  and
      Martin, Joshua  and
      Moeller, Sarah  and
      Tang, Kevin",
    editor = "Chakravarthi, Bharathi Raja  and
      Bharathi, B  and
      McCrae, John P  and
      Zarrouk, Manel  and
      Bali, Kalika  and
      Buitelaar, Paul",
    booktitle = "Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2022.ltedi-1.9/",
    doi = "10.18653/v1/2022.ltedi-1.9",
    pages = "70--75",
    abstract = "Recent research has highlighted that natural language processing (NLP) systems exhibit a bias againstAfrican American speakers. These errors are often caused by poor representation of linguistic features unique to African American English (AAE), which is due to the relatively low probability of occurrence for many such features. We present a workflow to overcome this issue in the case of habitual ``be''. Habitual ``be'' is isomorphic, and therefore ambiguous, with other forms of uninflected ``be'' found in both AAE and General American English (GAE). This creates a clear challenge for bias in NLP technologies. To overcome the scarcity, we employ a combination of rule-based filters and data augmentation that generate a corpus balanced between habitual and non-habitual instances. This balanced corpus trains unbiased machine learning classifiers, as demonstrated on a corpus of AAE transcribed texts, achieving .65 F$_1$ score at classifying habitual ``be''."
}Markdown (Informal)
[Disambiguation of morpho-syntactic features of African American English – the case of habitual be](https://preview.aclanthology.org/ingest-emnlp/2022.ltedi-1.9/) (Santiago et al., LTEDI 2022)
ACL