Chu Fei Luo
2025
Red-Teaming for Uncovering Societal Bias in Large Language Models
Chu Fei Luo
|
Ahmad Ghawanmeh
|
Kashyap Coimbatore Murali
|
Bhimshetty Bharat Kumar
|
Murli Jadhav
|
Xiaodan Zhu
|
Faiza Khan Khattak
Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)
Ensuring the safe deployment of AI systems is critical in industry settings where biased outputs can lead to significant operational, reputational, and regulatory risks. Thorough evaluation before deployment is essential to prevent these hazards. Red-teaming addresses this need by employing adversarial attacks to develop guardrails that detect and reject biased or harmful queries, enabling models to be retrained or steered away from harmful outputs. However, red-teaming techniques are often limited, and malicious actors may discover new vulnerabilities that bypass safety fine-tuning, underscoring the need for ongoing research and innovative approaches. Notably, most red-teaming efforts focus on harmful or unethical instructions rather than addressing social bias, leaving this critical area under-explored despite its significant real-world impact, especially in customer-facing AI systems. We propose two bias-specific red-teaming methods, Emotional Bias Probe (EBP) and BiasKG, to evaluate how standard safety measures for harmful content mitigate bias. For BiasKG, we refactor natural language stereotypes into a knowledge graph. and use adversarial attacking strategies to induce biased responses from several open- and closed-source language models. We find our method increases bias in all models, even those trained with safety guardrails. Our work emphasizes uncovering societal bias in LLMs through rigorous evaluation, addressing adversarial challenges to ensure AI safety in high-stakes industry deployments.
2024
Misinformation with Legal Consequences (MisLC): A New Task Towards Harnessing Societal Harm of Misinformation
Chu Fei Luo
|
Radin Shayanfar
|
Rohan V Bhambhoria
|
Samuel Dahan
|
Xiaodan Zhu
Findings of the Association for Computational Linguistics: EMNLP 2024
Misinformation, defined as false or inaccurate information, can result in significant societal harm when it is spread with malicious or even unintentional intent. The rapid online information exchange necessitates advanced detection mechanisms to mitigate misinformation-induced harm. Existing research, however, has predominantly focused on the veracity of information, overlooking the legal implications and consequences of misinformation. In this work, we take a novel angle to consolidate the definition of misinformation detection using legal issues as a measurement of societal ramifications, aiming to bring interdisciplinary efforts to tackle misinformation and its consequence. We introduce a new task: Misinformation with Legal Consequence (MisLC), which leverages definitions from a wide range of legal domains covering 4 broader legal topics and 11 fine-grained legal issues, including hate speech, election laws, and privacy regulations. For this task, we advocate a two-step dataset curation approach that utilizes crowd-sourced checkworthiness and expert evaluations of misinformation. We provide insights about the MisLC task through empirical evidence, from the problem definition to experiments and expert involvement. While the latest large language models and retrieval-augmented generation are effective baselines for the task, we find they are still far from replicating expert performance.
2023
Prototype-Based Interpretability for Legal Citation Prediction
Chu Fei Luo
|
Rohan Bhambhoria
|
Samuel Dahan
|
Xiaodan Zhu
Findings of the Association for Computational Linguistics: ACL 2023
Deep learning has made significant progress in the past decade, and demonstrates potential to solve problems with extensive social impact. In high-stakes decision making areas such as law, experts often require interpretability for automatic systems to be utilized in practical settings. In this work, we attempt to address these requirements applied to the important problem of legal citation prediction (LCP). We design the task with parallels to the thought-process of lawyers, i.e., with reference to both precedents and legislative provisions. After initial experimental results, we refine the target citation predictions with the feedback of legal experts. Additionally, we introduce a prototype architecture to add interpretability, achieving strong performance while adhering to decision parameters used by lawyers. Our study builds on and leverages the state-of-the-art language processing models for law, while addressing vital considerations for high-stakes tasks with practical societal impact.