Swetasudha Panda


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2022

pdf bib
Upstream Mitigation Is Not All You Need: Testing the Bias Transfer Hypothesis in Pre-Trained Language Models
Ryan Steed | Swetasudha Panda | Ari Kobren | Michael Wick
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

A few large, homogenous, pre-trained models undergird many machine learning systems — and often, these models contain harmful stereotypes learned from the internet. We investigate the bias transfer hypothesis: the theory that social biases (such as stereotypes) internalized by large language models during pre-training transfer into harmful task-specific behavior after fine-tuning. For two classification tasks, we find that reducing intrinsic bias with controlled interventions before fine-tuning does little to mitigate the classifier’s discriminatory behavior after fine-tuning. Regression analysis suggests that downstream disparities are better explained by biases in the fine-tuning dataset. Still, pre-training plays a role: simple alterations to co-occurrence rates in the fine-tuning dataset are ineffective when the model has been pre-trained. Our results encourage practitioners to focus more on dataset quality and context-specific harms.

pdf bib
Don’t Just Clean It, Proxy Clean It: Mitigating Bias by Proxy in Pre-Trained Models
Swetasudha Panda | Ari Kobren | Michael Wick | Qinlan Shen
Findings of the Association for Computational Linguistics: EMNLP 2022

Transformer-based pre-trained models are known to encode societal biases not only in their contextual representations, but also in downstream predictions when fine-tuned on task-specific data.We present D-Bias, an approach that selectively eliminates stereotypical associations (e.g, co-occurrence statistics) at fine-tuning, such that the model doesn’t learn to excessively rely on those signals.D-Bias attenuates biases from both identity words and frequently co-occurring proxies, which we select using pointwise mutual information.We apply D-Bias to a) occupation classification, and b) toxicity classification and find that our approach substantially reduces downstream biases (e.g. by > 60% in toxicity classification, for identities that are most frequently flagged as toxic on online platforms).In addition, we show that D-Bias dramatically improves upon scrubbing, i.e., removing only the identity words in question.We also demonstrate that D-Bias easily extends to multiple identities, and achieves competitive performance with two recently proposed debiasing approaches: R-LACE and INLP.