Yongkang Du
2026
Controllable Pareto Trade-off between Fairness and Accuracy
Yongkang Du | Jieyu Zhao | Yijun Yang | Tianyi Zhou
Proceedings of the 6th Workshop on Trustworthy NLP (TrustNLP 2026)
Yongkang Du | Jieyu Zhao | Yijun Yang | Tianyi Zhou
Proceedings of the 6th Workshop on Trustworthy NLP (TrustNLP 2026)
The fairness-accuracy trade-off is a key challenge in NLP tasks. Current work focuses on finding a single optimal solution to balance the two objectives, which is limited considering the diverse solutions on the Pareto front.This work intends to provide controllable trade-offs according to the user’s preference of the two objectives, which is defined as a reference vector. To achieve this goal, we apply multi-objective optimization (MOO), which can find solutions from various regions of the Pareto front. However, it is challenging to precisely control the trade-off due to the stochasticity of the training process and the high dimensional gradient vectors.Thus, we propose Controllable Pareto Trade-off (CPT) that can effectively train models to perform different trade-offs according to users’ preferences.CPT 1) stabilizes the fairness update with a moving average of stochastic gradients to determine the update direction, and 2) prunes the gradients by only keeping the gradients of the critical parameters. We evaluate CPT on hate speech detection and occupation classification tasks. Experiments show that CPT can achieve a higher-quality set of solutions on the Pareto front than the baseline methods. It also exhibits better controllability and can precisely follow the human-defined reference vectors.
2024
Self-contradictory reasoning evaluation and detection
Ziyi Liu | Soumya Sanyal | Isabelle Lee | Yongkang Du | Rahul Gupta | Yang Liu | Jieyu Zhao
Findings of the Association for Computational Linguistics: EMNLP 2024
Ziyi Liu | Soumya Sanyal | Isabelle Lee | Yongkang Du | Rahul Gupta | Yang Liu | Jieyu Zhao
Findings of the Association for Computational Linguistics: EMNLP 2024
In a plethora of recent work, large language models (LLMs) demonstrated impressive reasoning ability, but many proposed downstream reasoning tasks only focus on performance-wise evaluation. Two fundamental questions persist: 1) how consistent is the reasoning, and 2) can models detect unreliable reasoning? In this paper, we investigate self-contradictory (Self-Contra) reasoning, where the model reasoning does not support answers. To answer 1), we define and assess the Self-Contra rate across three datasets and delve into finer-grained categories of Self-Contra reasoning. We find that LLMs often contradict themselves in reasoning tasks involving contextual information understanding or commonsense. The model may generate correct answers by taking shortcuts in reasoning or overlooking contextual evidence, leading to compromised reasoning. For 2), we task the state-of-the-art model GPT-4 with identifying Self-Contra reasoning and finer-grained fallacies. We find that finer-grained aided detection can improve GPT-4’s ability to detect Self-Contra. However, it is only able to detect Self-Contra with a 52.2% F1 score, much lower compared to 66.7% for humans. Our results indicate that current LLMs lack the robustness necessary for reliable reasoning and we emphasize the urgent need for establishing best practices in comprehensive reasoning evaluations beyond pure performance-based metrics.