Aubrie Amstutz
2023
FairPrism: Evaluating Fairness-Related Harms in Text Generation
Eve Fleisig
|
Aubrie Amstutz
|
Chad Atalla
|
Su Lin Blodgett
|
Hal Daumé III
|
Alexandra Olteanu
|
Emily Sheng
|
Dan Vann
|
Hanna Wallach
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
It is critical to measure and mitigate fairness-related harms caused by AI text generation systems, including stereotyping and demeaning harms. To that end, we introduce FairPrism, a dataset of 5,000 examples of AI-generated English text with detailed human annotations covering a diverse set of harms relating to gender and sexuality. FairPrism aims to address several limitations of existing datasets for measuring and mitigating fairness-related harms, including improved transparency, clearer specification of dataset coverage, and accounting for annotator disagreement and harms that are context-dependent. FairPrism’s annotations include the extent of stereotyping and demeaning harms, the demographic groups targeted, and appropriateness for different applications. The annotations also include specific harms that occur in interactive contexts and harms that raise normative concerns when the “speaker” is an AI system. Due to its precision and granularity, FairPrism can be used to diagnose (1) the types of fairness-related harms that AI text generation systems cause, and (2) the potential limitations of mitigation methods, both of which we illustrate through case studies. Finally, the process we followed to develop FairPrism offers a recipe for building improved datasets for measuring and mitigating harms caused by AI systems.
2022
On Measures of Biases and Harms in NLP
Sunipa Dev
|
Emily Sheng
|
Jieyu Zhao
|
Aubrie Amstutz
|
Jiao Sun
|
Yu Hou
|
Mattie Sanseverino
|
Jiin Kim
|
Akihiro Nishi
|
Nanyun Peng
|
Kai-Wei Chang
Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022
Recent studies show that Natural Language Processing (NLP) technologies propagate societal biases about demographic groups associated with attributes such as gender, race, and nationality. To create interventions and mitigate these biases and associated harms, it is vital to be able to detect and measure such biases. While existing works propose bias evaluation and mitigation methods for various tasks, there remains a need to cohesively understand the biases and the specific harms they measure, and how different measures compare with each other. To address this gap, this work presents a practical framework of harms and a series of questions that practitioners can answer to guide the development of bias measures. As a validation of our framework and documentation questions, we also present several case studies of how existing bias measures in NLP—both intrinsic measures of bias in representations and extrinsic measures of bias of downstream applications—can be aligned with different harms and how our proposed documentation questions facilitates more holistic understanding of what bias measures are measuring.
Search
Co-authors
- Emily Sheng 2
- Sunipa Dev 1
- Jieyu Zhao 1
- Jiao Sun 1
- Yu Hou 1
- show all...