Lawyers are Dishonest? Quantifying Representational Harms in Commonsense Knowledge Resources
Ninareh Mehrabi, Pei Zhou, Fred Morstatter, Jay Pujara, Xiang Ren, Aram Galstyan
Abstract
Warning: this paper contains content that may be offensive or upsetting. Commonsense knowledge bases (CSKB) are increasingly used for various natural language processing tasks. Since CSKBs are mostly human-generated and may reflect societal biases, it is important to ensure that such biases are not conflated with the notion of commonsense. Here we focus on two widely used CSKBs, ConceptNet and GenericsKB, and establish the presence of bias in the form of two types of representational harms, overgeneralization of polarized perceptions and representation disparity across different demographic groups in both CSKBs. Next, we find similar representational harms for downstream models that use ConceptNet. Finally, we propose a filtering-based approach for mitigating such harms, and observe that our filtered-based approach can reduce the issues in both resources and models but leads to a performance drop, leaving room for future work to build fairer and stronger commonsense models.- Anthology ID:
- 2021.emnlp-main.410
- Volume:
- Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2021
- Address:
- Online and Punta Cana, Dominican Republic
- Editors:
- Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 5016–5033
- Language:
- URL:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2021.emnlp-main.410/
- DOI:
- 10.18653/v1/2021.emnlp-main.410
- Cite (ACL):
- Ninareh Mehrabi, Pei Zhou, Fred Morstatter, Jay Pujara, Xiang Ren, and Aram Galstyan. 2021. Lawyers are Dishonest? Quantifying Representational Harms in Commonsense Knowledge Resources. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5016–5033, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- Lawyers are Dishonest? Quantifying Representational Harms in Commonsense Knowledge Resources (Mehrabi et al., EMNLP 2021)
- PDF:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2021.emnlp-main.410.pdf
- Data
- ConceptNet, GenericsKB