Yingji Li

2026

From Fake to Real: Mitigating Out-of-Distribution Bias in In-Context Learning via Feedback Supervision from Large Language Models
Rui Song | Yingji Li | Jian Li | Fausto Giunchiglia | Hao Xu
Findings of the Association for Computational Linguistics: ACL 2026

With the rapid development of Large Language Models (LLMs), In-Context Learning (ICL) has emerged as one of the universal paradigms for unleashing the capabilities of LLMs. However, LLMs are generally plagued by various biases in context example selection, which can distort the model’s predictions. Although extensive research has focused on designing heuristic sample selection methods to mitigate biases in ICL, these approaches often struggle to adapt to highly biased out-of-distribution (OOD) scenarios with significant shifts between test samples and context samples. To overcome the aforementioned issue, this paper proposes a LLM-driven iterative derivation method for OOD data pseudo-labeling (named LPL), aiming to mitigate the risk of performance degradation caused by OOD bias by avoiding direct use of source data. To mitigate the misleading effects of noise in pseudo-labels, we propose a filtering metric that integrates model confidence and perturbation perplexity to enhance the quality of pseudo-labels. Subsequently, in each iteration, LPL utilizes this metric to expand new pseudo-labeled data as contextual demonstrations and ultimately adopts a voting mechanism to ensure the stability of the predictions. A series of experiments conducted on various LLMs have confirmed that our proposed method can effectively reduce OOD biases, thereby opening up new avenues for research in ICL biases.

pdf bib abs

Distilling the Essence, Discarding the Dross: Improving Fairness in Multimodal Large Language Models via Historical Reflection-Guided Prompt Optimization
Juncheng Hu | Jiming Yu | Rui Song | Kedi Lyu | Yingji Li | Zheli Liu
Findings of the Association for Computational Linguistics: ACL 2026

ocial bias in Multimodal Large Language Models (MLLMs) has become an increasingly important concern. Prompt-based approaches offer a lightweight solution for debiasing; however, existing methods rely heavily on handcrafted prompts that are brittle, highly context-sensitive, and difficult to generalize across tasks, bias types, and multimodal settings. In this work, we propose Historical Reflection-Guided Prompt Optimization (HRPO), an adaptive self-debiasing framework for black-box MLLMs that automatically optimizes task-specific debiasing prompts to suppress stereotypical outputs. To mitigate forgetting during prompt optimization, we introduce Historical Contrastive Self-Reflection (HCSR), which performs contrastive reflection over positive and negative optimization histories, enabling the model to retain effective prompts and avoid redundant exploration, thereby improving optimization efficiency. Experiments on three benchmarks involving eight open-source and two closed-source MLLMs, covering ten singular and two intersectional bias types, demonstrate that HRPO achieves strong debiasing performance while offering improved interpretability, generalization, and robustness. Code is available at: https://github.com/liyingji1996/HRPO.

pdf bib abs

Toward Robust In-Context Learning: Leveraging Out-of-distribution Proxies for Target Inaccessible Demonstration Retrieval
Hao Xu | Rite Bo | Fausto Giunchiglia | Yingji Li | Rui Song
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Although studies have demonstrated that Large Language Models (LLMs) can perform well on Out-of-Distribution (OOD) tasks, their advantage tends to diminish as the distribution shift becomes more severe. Consequently, researchers aim to retrieve distributionally similar and informative demonstrations from the available source domain to boost the inference capabilities of LLMs. However, in practical scenarios where the target domain is inaccessible, evaluating the unknown distribution is challenging, which indirectly impacts the quality of the selected demonstrations. To address this problem, we propose DOPA, a demonstration search framework that incorporates an OOD proxy to approximate the inaccessible target domain and guide the retrieval process. Building on proxy-based evaluation, DOPA further introduces a Mahalanobis distance-based global diversity constraint to ensure sufficient diversity among the retrieved demonstrations. Experimental results on multiple LLMs and tasks demonstrate that DOPA effectively enhances robustness in OOD settings.

pdf bib abs

Enhancing Multimodal Large Language Models for Ancient Chinese Character Evolution Analysis via Glyph-Driven Fine-Tuning
Rui Song | Lida Shi | Ruihua Qi | Yingji Li | Hao Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In recent years, rapid advances in Multimodal Large Language Models (MLLMs) have increasingly stimulated research on ancient Chinese scripts. As the evolution of written characters constitutes a fundamental pathway for understanding cultural transformation and historical continuity, how MLLMs can be systematically leveraged to support and advance text evolution analysis remains an open and largely underexplored problem. To bridge this gap, we construct a comprehensive benchmark comprising 11 tasks and over 130,000 instances, specifically designed to evaluate the capability of MLLMs in analyzing the evolution of ancient Chinese scripts. We conduct extensive evaluations across multiple widely used MLLMs and observe that, while existing models demonstrate a limited ability in glyph-level comparison, their performance on core tasks-such as character recognition and evolutionary reasoning-remains substantially constrained. Motivated by these findings, we propose a glyph-driven fine-tuning framework (GEVO) that explicitly encourages models to capture evolutionary consistency in glyph transformations and enhances their understanding of text evolution. Experimental results show that even models at the 2B scale achieve consistent and comprehensive performance improvements across all evaluated tasks. To facilitate future research, we publicly release both the benchmark and the trained models.

2024

pdf bib abs

Data-Centric Explainable Debiasing for Improving Fairness in Pre-trained Language Models
Yingji Li | Mengnan Du | Rui Song | Xin Wang | Ying Wang
Findings of the Association for Computational Linguistics: ACL 2024

Human-like social bias of pre-trained language models (PLMs) on downstream tasks have attracted increasing attention. The potential flaws in the training data are the main factor that causes unfairness in PLMs. Existing data-centric debiasing strategies mainly leverage explicit bias words (defined as sensitive attribute words specific to demographic groups) for counterfactual data augmentation to balance the training data. However, they lack consideration of implicit bias words potentially associated with explicit bias words in complex distribution data, which indirectly harms the fairness of PLMs. To this end, we propose a **Data**-Centric **Debias**ing method (named Data-Debias), which uses an explainability method to search for implicit bias words to assist in debiasing PLMs. Specifically, we compute the feature attributions of all tokens using the Integrated Gradients method, and then treat the tokens that have a large impact on the model’s decision as implicit bias words. To make the search results more precise, we iteratively train a biased model to amplify the bias with each iteration. Finally, we use the implicit bias words searched in the last iteration to assist in debiasing PLMs. Extensive experimental results on multiple PLMs debiasing on three different classification tasks demonstrate that Data-Debias achieves state-of-the-art debiasing performance and strong generalization while maintaining predictive abilities.

2023

pdf bib abs

Prompt Tuning Pushes Farther, Contrastive Learning Pulls Closer: A Two-Stage Approach to Mitigate Social Biases
Yingji Li | Mengnan Du | Xin Wang | Ying Wang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

As the representation capability of Pre-trained Language Models (PLMs) improve, there is growing concern that they will inherit social biases from unprocessed corpora. Most previous debiasing techniques used Counterfactual Data Augmentation (CDA) to balance the training corpus. However, CDA slightly modifies the original corpus, limiting the representation distance between different demographic groups to a narrow range. As a result, the debiasing model easily fits the differences between counterfactual pairs, which affects its debiasing performance with limited text resources. In this paper, we propose an adversarial training-inspired two-stage debiasing model using Contrastive learning with Continuous Prompt Augmentation (named CCPA) to mitigate social biases in PLMs’ encoding. In the first stage, we propose a data augmentation method based on continuous prompt tuning to push farther the representation distance between sample pairs along different demographic groups. In the second stage, we utilize contrastive learning to pull closer the representation distance between the augmented sample pairs and then fine-tune PLMs’ parameters to get debiased encoding. Our approach guides the model to achieve stronger debiasing performance by adding difficulty to the training process. Extensive experiments show that CCPA outperforms baselines in terms of debiasing performance. Meanwhile, experimental results on the GLUE benchmark show that CCPA retains the language modeling capability of PLMs.

Co-authors

Jian Li 1

Venues

ACL3
Findings3

Fix author