2025
pdf
bib
abs
Red-Teaming for Uncovering Societal Bias in Large Language Models
Chu Fei Luo
|
Ahmad Ghawanmeh
|
Kashyap Coimbatore Murali
|
Bhimshetty Bharat Kumar
|
Murli Jadhav
|
Xiaodan Zhu
|
Faiza Khan Khattak
Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)
Ensuring the safe deployment of AI systems is critical in industry settings where biased outputs can lead to significant operational, reputational, and regulatory risks. Thorough evaluation before deployment is essential to prevent these hazards. Red-teaming addresses this need by employing adversarial attacks to develop guardrails that detect and reject biased or harmful queries, enabling models to be retrained or steered away from harmful outputs. However, red-teaming techniques are often limited, and malicious actors may discover new vulnerabilities that bypass safety fine-tuning, underscoring the need for ongoing research and innovative approaches. Notably, most red-teaming efforts focus on harmful or unethical instructions rather than addressing social bias, leaving this critical area under-explored despite its significant real-world impact, especially in customer-facing AI systems. We propose two bias-specific red-teaming methods, Emotional Bias Probe (EBP) and BiasKG, to evaluate how standard safety measures for harmful content mitigate bias. For BiasKG, we refactor natural language stereotypes into a knowledge graph. and use adversarial attacking strategies to induce biased responses from several open- and closed-source language models. We find our method increases bias in all models, even those trained with safety guardrails. Our work emphasizes uncovering societal bias in LLMs through rigorous evaluation, addressing adversarial challenges to ensure AI safety in high-stakes industry deployments.
2024
pdf
bib
abs
Can Machine Unlearning Reduce Social Bias in Language Models?
Omkar Dige
|
Diljot Arneja
|
Tsz Fung Yau
|
Qixuan Zhang
|
Mohammad Bolandraftar
|
Xiaodan Zhu
|
Faiza Khan Khattak
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Mitigating bias in language models (LMs) has become a critical problem due to the widespread deployment of LMs in the industry and customer-facing applications. Numerous approaches revolve around data pre-processing and subsequent fine-tuning of language models, tasks that can be both time-consuming and computationally demanding. As alternatives, machine unlearning techniques are being explored, yet there is a notable lack of comparative studies evaluating the effectiveness of these methods. In this work, we explore the effectiveness of two machine unlearning methods: Partitioned Contrastive Gradient Unlearning (PCGU) applied on decoder models, and Negation via Task Vector, and compare them with Direct Preference Optimization (DPO) to reduce social biases in open-source LMs such as LLaMA-2 and OPT. We also implement distributed PCGU for large models. It is empirically shown, through quantitative and qualitative analyses, that negation via Task Vector method outperforms PCGU and is comparable to DPO in debiasing models with minimum deterioration in model performance and perplexity. Negation via Task Vector reduces the bias score by 25.5% for LLaMA-2 and achieves bias reduction of up to 40% for OPT models. Moreover, it can be easily tuned to balance the trade-off between bias reduction and generation quality, unlike DPO.
2022
pdf
bib
abs
Bringing the State-of-the-Art to Customers: A Neural Agent Assistant Framework for Customer Service Support
Stephen Obadinma
|
Faiza Khan Khattak
|
Shirley Wang
|
Tania Sidhom
|
Elaine Lau
|
Sean Robertson
|
Jingcheng Niu
|
Winnie Au
|
Alif Munim
|
Karthik Raja K. Bhaskar
|
Bencheng Wei
|
Iris Ren
|
Waqar Muhammad
|
Erin Li
|
Bukola Ishola
|
Michael Wang
|
Griffin Tanner
|
Yu-Jia Shiah
|
Sean X. Zhang
|
Kwesi P. Apponsah
|
Kanishk Patel
|
Jaswinder Narain
|
Deval Pandya
|
Xiaodan Zhu
|
Frank Rudzicz
|
Elham Dolatabadi
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track
Building Agent Assistants that can help improve customer service support requires inputs from industry users and their customers, as well as knowledge about state-of-the-art Natural Language Processing (NLP) technology. We combine expertise from academia and industry to bridge the gap and build task/domain-specific Neural Agent Assistants (NAA) with three high-level components for: (1) Intent Identification, (2) Context Retrieval, and (3) Response Generation. In this paper, we outline the pipeline of the NAA’s core system and also present three case studies in which three industry partners successfully adapt the framework to find solutions to their unique challenges. Our findings suggest that a collaborative process is instrumental in spurring the development of emerging NLP models for Conversational AI tasks in industry. The full reference implementation code and results are available at
https://github.com/VectorInstitute/NAA.
2019
pdf
bib
abs
Extracting relevant information from physician-patient dialogues for automated clinical note taking
Serena Jeblee
|
Faiza Khan Khattak
|
Noah Crampton
|
Muhammad Mamdani
|
Frank Rudzicz
Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019)
We present a system for automatically extracting pertinent medical information from dialogues between clinicians and patients. The system parses each dialogue and extracts entities such as medications and symptoms, using context to predict which entities are relevant. We also classify the primary diagnosis for each conversation. In addition, we extract topic information and identify relevant utterances. This serves as a baseline for a system that extracts information from dialogues and automatically generates a patient note, which can be reviewed and edited by the clinician.
pdf
bib
abs
Predicting ICU transfers using text messages between nurses and doctors
Faiza Khan Khattak
|
Chloé Pou-Prom
|
Robert Wu
|
Frank Rudzicz
Proceedings of the 2nd Clinical Natural Language Processing Workshop
We explore the use of real-time clinical information, i.e., text messages sent between nurses and doctors regarding patient conditions in order to predict transfer to the intensive care unit(ICU). Preliminary results, in data from five hospitals, indicate that, despite being short and full of noise, text messages can augment other visit information to improve the performance of ICU transfer prediction.