@inproceedings{yu-etal-2023-unlearning,
    title = "Unlearning Bias in Language Models by Partitioning Gradients",
    author = "Yu, Charles  and
      Jeoung, Sullam  and
      Kasi, Anish  and
      Yu, Pengfei  and
      Ji, Heng",
    editor = "Rogers, Anna  and
      Boyd-Graber, Jordan  and
      Okazaki, Naoaki",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2023.findings-acl.375/",
    doi = "10.18653/v1/2023.findings-acl.375",
    pages = "6032--6048",
    abstract = "Recent research has shown that large-scale pretrained language models, specifically transformers, tend to exhibit issues relating to racism, sexism, religion bias, and toxicity in general. Unfortunately, these pretrained language models are used almost universally in downstream tasks, and natural language processing is often applied to make real-world predictions. Thus, debiasing these language models as early in development as possible is increasingly crucial for preventing unintentional harms caused by natural language systems. To this end, we propose a new technique called partitioned contrastive gradient unlearning (PCGU), a gray-box method for debiasing pretrained masked language models. PCGU aims to optimize only the weights that contribute most to a specific domain of bias, doing so by computing a first-order approximation based on the gradients of contrastive sentence pairs. Our experiments show that PCGU is both low-cost and seems particularly effective at pinpointing the sources of implicit social bias in large pretrained transformers. Although we train using PCGU in the gender-profession domain only, we find that doing so can also partially mitigate bias across other domains. All code for our implementation and experiments can be found at \url{https://github.com/CharlesYu2000/PCGU-UnlearningBias}."
}Markdown (Informal)
[Unlearning Bias in Language Models by Partitioning Gradients](https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2023.findings-acl.375/) (Yu et al., Findings 2023)
ACL