Rui Song

Other people with similar names: Rui Song

Unverified author pages with similar names: Rui Song


2026

With the rapid development of Large Language Models (LLMs), In-Context Learning (ICL) has emerged as one of the universal paradigms for unleashing the capabilities of LLMs. However, LLMs are generally plagued by various biases in context example selection, which can distort the model’s predictions. Although extensive research has focused on designing heuristic sample selection methods to mitigate biases in ICL, these approaches often struggle to adapt to highly biased out-of-distribution (OOD) scenarios with significant shifts between test samples and context samples. To overcome the aforementioned issue, this paper proposes a LLM-driven iterative derivation method for OOD data pseudo-labeling (named LPL), aiming to mitigate the risk of performance degradation caused by OOD bias by avoiding direct use of source data. To mitigate the misleading effects of noise in pseudo-labels, we propose a filtering metric that integrates model confidence and perturbation perplexity to enhance the quality of pseudo-labels. Subsequently, in each iteration, LPL utilizes this metric to expand new pseudo-labeled data as contextual demonstrations and ultimately adopts a voting mechanism to ensure the stability of the predictions. A series of experiments conducted on various LLMs have confirmed that our proposed method can effectively reduce OOD biases, thereby opening up new avenues for research in ICL biases.
ocial bias in Multimodal Large Language Models (MLLMs) has become an increasingly important concern. Prompt-based approaches offer a lightweight solution for debiasing; however, existing methods rely heavily on handcrafted prompts that are brittle, highly context-sensitive, and difficult to generalize across tasks, bias types, and multimodal settings. In this work, we propose Historical Reflection-Guided Prompt Optimization (HRPO), an adaptive self-debiasing framework for black-box MLLMs that automatically optimizes task-specific debiasing prompts to suppress stereotypical outputs. To mitigate forgetting during prompt optimization, we introduce Historical Contrastive Self-Reflection (HCSR), which performs contrastive reflection over positive and negative optimization histories, enabling the model to retain effective prompts and avoid redundant exploration, thereby improving optimization efficiency. Experiments on three benchmarks involving eight open-source and two closed-source MLLMs, covering ten singular and two intersectional bias types, demonstrate that HRPO achieves strong debiasing performance while offering improved interpretability, generalization, and robustness. Code is available at: https://github.com/liyingji1996/HRPO.
Although studies have demonstrated that Large Language Models (LLMs) can perform well on Out-of-Distribution (OOD) tasks, their advantage tends to diminish as the distribution shift becomes more severe. Consequently, researchers aim to retrieve distributionally similar and informative demonstrations from the available source domain to boost the inference capabilities of LLMs. However, in practical scenarios where the target domain is inaccessible, evaluating the unknown distribution is challenging, which indirectly impacts the quality of the selected demonstrations. To address this problem, we propose DOPA, a demonstration search framework that incorporates an OOD proxy to approximate the inaccessible target domain and guide the retrieval process. Building on proxy-based evaluation, DOPA further introduces a Mahalanobis distance-based global diversity constraint to ensure sufficient diversity among the retrieved demonstrations. Experimental results on multiple LLMs and tasks demonstrate that DOPA effectively enhances robustness in OOD settings.
Research on ancient Chinese language is of great significance for tracing Chinese history and civilization. In the field of large language models, studies on the pre-Qin excavated documents such as Oracle Bone Inscriptions, Bronze Inscriptions, and Bamboo Book of Chu remain insufficient. This is because these ancient characters have a low level of digitization, training corpora are extremely scarce, and they typically contain complex and rich semantic information. Therefore, we propose an ancient character semantic-aware embedding for large language models. This embedding integrates both the glyph and lexicality of ancient characters and maps them to the modern Chinese semantic space. We also design a two-stage method for lightweight and parameter-efficient training of the embedding. Finally, we conduct extensive experiments on excavated documents from the pre-Qin period, and the results demonstrate the effectiveness of our approach.
In recent years, rapid advances in Multimodal Large Language Models (MLLMs) have increasingly stimulated research on ancient Chinese scripts. As the evolution of written characters constitutes a fundamental pathway for understanding cultural transformation and historical continuity, how MLLMs can be systematically leveraged to support and advance text evolution analysis remains an open and largely underexplored problem. To bridge this gap, we construct a comprehensive benchmark comprising 11 tasks and over 130,000 instances, specifically designed to evaluate the capability of MLLMs in analyzing the evolution of ancient Chinese scripts. We conduct extensive evaluations across multiple widely used MLLMs and observe that, while existing models demonstrate a limited ability in glyph-level comparison, their performance on core tasks-such as character recognition and evolutionary reasoning-remains substantially constrained. Motivated by these findings, we propose a glyph-driven fine-tuning framework (GEVO) that explicitly encourages models to capture evolutionary consistency in glyph transformations and enhances their understanding of text evolution. Experimental results show that even models at the 2B scale achieve consistent and comprehensive performance improvements across all evaluated tasks. To facilitate future research, we publicly release both the benchmark and the trained models.

2025

Negotiation agents need to influence the attitudes or intentions of users to reach a consensus. Strategy planning and expressive optimization are crucial aspects of effective negotiations. However, previous studies have typically focused on only one of these aspects, neglecting the fact that their combined synergistic effect can lead to better performance. Inspired by the dual-process theory in human cognition, we propose a Dual-Mind Negotiation Agent (DMNA) framework. This framework integrates an intuitive module for rapid, experience-based response and a deliberative module for slow, expression optimization. The intuitive module is trained using Monte Carlo Tree Search (MCTS) and Direct Preference Optimization (DPO), enabling it to make suitable strategic planning and expression. The deliberative module employs a multifaceted reflexion mechanism to enhance the quality of expression. Experiments conducted on negotiation datasets confirm that DMNA achieves state-of-the-art results, demonstrating an enhancement in the negotiation ability of agents.

2024

Human-like social bias of pre-trained language models (PLMs) on downstream tasks have attracted increasing attention. The potential flaws in the training data are the main factor that causes unfairness in PLMs. Existing data-centric debiasing strategies mainly leverage explicit bias words (defined as sensitive attribute words specific to demographic groups) for counterfactual data augmentation to balance the training data. However, they lack consideration of implicit bias words potentially associated with explicit bias words in complex distribution data, which indirectly harms the fairness of PLMs. To this end, we propose a **Data**-Centric **Debias**ing method (named Data-Debias), which uses an explainability method to search for implicit bias words to assist in debiasing PLMs. Specifically, we compute the feature attributions of all tokens using the Integrated Gradients method, and then treat the tokens that have a large impact on the model’s decision as implicit bias words. To make the search results more precise, we iteratively train a biased model to amplify the bias with each iteration. Finally, we use the implicit bias words searched in the last iteration to assist in debiasing PLMs. Extensive experimental results on multiple PLMs debiasing on three different classification tasks demonstrate that Data-Debias achieves state-of-the-art debiasing performance and strong generalization while maintaining predictive abilities.
Recently, there has been significant interest in replacing the reward model in Reinforcement Learning with Human Feedback (RLHF) methods for Large Language Models (LLMs), such as Direct Preference Optimization (DPO) and its variants. These approaches commonly use a binary cross-entropy mechanism on pairwise samples, i.e., minimizing and maximizing the loss based on preferred or dis-preferred responses, respectively. However, while this training strategy omits the reward model, it also overlooks the varying preference degrees within different responses. We hypothesize that this is a key factor hindering LLMs from sufficiently understanding human preferences. To address this problem, we propose a novel Self-supervised Preference Optimization (SPO) framework, which constructs a self-supervised preference degree loss combined with the alignment loss, thereby helping LLMs improve their ability to understand the degree of preference. Extensive experiments are conducted on two widely used datasets of different tasks. The results demonstrate that SPO can be seamlessly integrated with existing preference optimization methods and significantly boost their performance to achieve state-of-the-art performance. We also conduct detailed analyses to offer comprehensive insights into SPO, which verifies its effectiveness. The code is available at https://github.com/lijian16/SPO.