Jiaxu Zhao


2025

pdf bib
Unmasking Style Sensitivity: A Causal Analysis of Bias Evaluation Instability in Large Language Models
Jiaxu Zhao | Meng Fang | Kun Zhang | Mykola Pechenizkiy
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Natural language processing applications are increasingly prevalent, but social biases in their outputs remain a critical challenge. While various bias evaluation methods have been proposed, these assessments show unexpected instability when input texts undergo minor stylistic changes. This paper conducts a comprehensive analysis of how different style transformations impact bias evaluation results across multiple language models and bias types using causal inference techniques. Our findings reveal that formality transformations significantly affect bias scores, with informal style showing substantial bias reductions (up to 8.33% in LLaMA-2-13B). We identify appearance bias, sexual orientation bias, and religious bias as most susceptible to style changes, with variations exceeding 20%. Larger models demonstrate greater sensitivity to stylistic variations, with bias measurements fluctuating up to 3.1% more than in smaller models. These results highlight critical limitations in current bias evaluation methods and emphasize the need for reliable and fair assessments of language models.

pdf bib
Understanding Large Language Model Vulnerabilities to Social Bias Attacks
Jiaxu Zhao | Meng Fang | Fanghua Ye | Ke Xu | Qin Zhang | Joey Tianyi Zhou | Mykola Pechenizkiy
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large Language Models (LLMs) have become foundational in human-computer interaction, demonstrating remarkable linguistic capabilities across various tasks. However, there is a growing concern about their potential to perpetuate social biases present in their training data. In this paper, we comprehensively investigate the vulnerabilities of contemporary LLMs to various social bias attacks, including prefix injection, refusal suppression, and learned attack prompts. We evaluate popular models such as LLaMA-2, GPT-3.5, and GPT-4 across gender, racial, and religious bias types. Our findings reveal that models are generally more susceptible to gender bias attacks compared to racial or religious biases. We also explore novel aspects such as cross-bias and multiple-bias attacks, finding varying degrees of transferability across bias types. Additionally, our results show that larger models and pretrained base models often exhibit higher susceptibility to bias attacks. These insights contribute to the development of more inclusive and ethically responsible LLMs, emphasizing the importance of understanding and mitigating potential bias vulnerabilities. We offer recommendations for model developers and users to enhance the robustness of LLMs against social bias attacks.

2024

pdf bib
More than Minorities and Majorities: Understanding Multilateral Bias in Language Generation
Jiaxu Zhao | Zijing Shi | Yitong Li | Yulong Pei | Ling Chen | Meng Fang | Mykola Pechenizkiy
Findings of the Association for Computational Linguistics: ACL 2024

Pretrained models learned from real corpora can often capture undesirable features, leading to bias issues against different demographic groups. Most existing studies on bias dataset construction or bias mitigation methods only focus on one demographic group pair to study a certain bias, e.g. black vs. white for racial bias. However, in real-world applications, there are more than two demographic groups that are at risk of the same bias. In this paper, we propose to analyze and reduce biases across multiple demographic groups. We collect and build a multi-demographic bias dataset including five commonly discussed bias dimensions. To mitigate multi-demographic bias, we adopt several novel debiasing methods, including regularisation-based and augmentation-based methods, as well as appropriate evaluation metrics for multi-demographic bias measurement. Experimental results on the proposed multi-demographic dataset show that a fairer model can be achieved using a multi-demographic debiasing approach. Also, the model debiased using the proposed multi-demographic debiasing methods can better transfer to unseen demographics without sacrificing the performance of the pretrained model.

pdf bib
CHAmbi: A New Benchmark on Chinese Ambiguity Challenges for Large Language Models
Qin Zhang | Sihan Cai | Jiaxu Zhao | Mykola Pechenizkiy | Meng Fang
Findings of the Association for Computational Linguistics: EMNLP 2024

Ambiguity is an inherent feature of language, whose management is crucial for effective communication and collaboration. This is particularly true for Chinese, a language with extensive lexical-morphemic ambiguity. Despite the wide use of large language models (LLMs) in numerous domains and their growing proficiency in Chinese, there is a notable lack of datasets to thoroughly evaluate LLMs’ ability to handle ambiguity in Chinese. To bridge this gap, we introduce the CHAmbi dataset, a specialized Chinese multi-label disambiguation dataset formatted in Natural Language Inference. It comprises 4,991 pairs of premises and hypotheses, including 824 examples featuring a wide range of ambiguities. In addition to the dataset, we develop a series of tests and conduct an extensive evaluation of pre-trained LLMs’ proficiency in identifying and resolving ambiguity in the Chinese language. Our findings reveal that GPT-4 consistently delivers commendable performance across various evaluative measures, albeit with limitations in robustness. The performances of other LLMs, however, demonstrate variability in handling ambiguity-related tasks, underscoring the complexity of such tasks in the context of Chinese. The overall results highlight the challenge of ambiguity handling for current LLMs and underscore the imperative need for further enhancement in LLM capabilities for effective ambiguity resolution in the Chinese language.

2023

pdf bib
CHBias: Bias Evaluation and Mitigation of Chinese Conversational Language Models
Jiaxu Zhao | Meng Fang | Zijing Shi | Yitong Li | Ling Chen | Mykola Pechenizkiy
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

redWarning: This paper contains content that may be offensive or upsetting.Pretrained conversational agents have been exposed to safety issues, exhibiting a range of stereotypical human biases such as gender bias. However, there are still limited bias categories in current research, and most of them only focus on English. In this paper, we introduce a new Chinese dataset, CHBias, for bias evaluation and mitigation of Chinese conversational language models.Apart from those previous well-explored bias categories, CHBias includes under-explored bias categories, such as ageism and appearance biases, which received less attention. We evaluate two popular pretrained Chinese conversational models, CDial-GPT and EVA2.0, using CHBias. Furthermore, to mitigate different biases, we apply several debiasing methods to the Chinese pretrained models. Experimental results show that these Chinese pretrained models are potentially risky for generating texts that contain social biases, and debiasing methods using the proposed dataset can make response generation less biased while preserving the models’ conversational capabilities.