Abstract
Recent studies show that NLP models trained on standard English texts tend to produce biased outcomes against underrepresented English varieties. In this work, we conduct a pioneering study of the English variety use of African American English (AAE) in NLI task. First, we propose CodeSwitch, a greedy unidirectional morphosyntactically-informed rule-based translation method for data augmentation. Next, we use CodeSwitch to present a preliminary study to determine if demographic language features do in fact influence models to produce false predictions. Then, we conduct experiments on two popular datasets and propose two simple, yet effective and generalizable debiasing methods. Our findings show that NLI models (e.g. BERT) trained under our proposed frameworks outperform traditional large language models while maintaining or even improving the prediction performance. In addition, we intend to release CodeSwitch, in hopes of promoting dialectal language diversity in training data to both reduce the discriminatory societal impacts and improve model robustness of downstream NLP tasks.- Anthology ID:
- 2022.coling-1.124
- Volume:
- Proceedings of the 29th International Conference on Computational Linguistics
- Month:
- October
- Year:
- 2022
- Address:
- Gyeongju, Republic of Korea
- Editors:
- Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 1442–1454
- Language:
- URL:
- https://aclanthology.org/2022.coling-1.124
- DOI:
- Cite (ACL):
- Jamell Dacon, Haochen Liu, and Jiliang Tang. 2022. Evaluating and Mitigating Inherent Linguistic Bias of African American English through Inference. In Proceedings of the 29th International Conference on Computational Linguistics, pages 1442–1454, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- Cite (Informal):
- Evaluating and Mitigating Inherent Linguistic Bias of African American English through Inference (Dacon et al., COLING 2022)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/2022.coling-1.124.pdf
- Data
- MultiNLI, SNLI