Abstract
A common factor in bias measurement methods is the use of hand-curated seed lexicons, but there remains little guidance for their selection. We gather seeds used in prior work, documenting their common sources and rationales, and in case studies of three English-language corpora, we enumerate the different types of social biases and linguistic features that, once encoded in the seeds, can affect subsequent bias measurements. Seeds developed in one context are often re-used in other contexts, but documentation and evaluation remain necessary precursors to relying on seeds for sensitive measurements.- Anthology ID:
- 2021.acl-long.148
- Volume:
- Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
- Month:
- August
- Year:
- 2021
- Address:
- Online
- Venues:
- ACL | IJCNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1889–1904
- Language:
- URL:
- https://aclanthology.org/2021.acl-long.148
- DOI:
- 10.18653/v1/2021.acl-long.148
- Cite (ACL):
- Maria Antoniak and David Mimno. 2021. Bad Seeds: Evaluating Lexical Methods for Bias Measurement. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1889–1904, Online. Association for Computational Linguistics.
- Cite (Informal):
- Bad Seeds: Evaluating Lexical Methods for Bias Measurement (Antoniak & Mimno, ACL-IJCNLP 2021)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2021.acl-long.148.pdf
- Code
- maria-antoniak/bad-seeds