Zelong Yu


2026

Imbalanced data are commonly present in real-world applications. While data synthesis can effectively mitigate data scarcity for rare classes, and LLMs have revolutionized text generation, the application of LLMs to the synthesis of relational/structured tabular data remains underexplored. Moreover, existing approaches lack an effective feedback mechanism to guide LLMs in continuously optimizing the quality of the generated data throughout the synthesis process. In this work, we propose RDDG, Relational Data generator with Dynamic Guidance, which is a unified in-context learning framework that employs progressive chain-of-thought (CoT) steps to generate tabular data for enhancing downstream imbalanced classification performance. RDDG first uses core set selection to identify representative samples from the original data, then utilizes in-context learning to discover the inherent patterns and correlations among attributes within the core set, and subsequently generates tabular data while preserving the aforementioned constraints. More importantly, it incorporates a self-reinforcing feedback mechanism that provides automatic assessments of the quality of the generated data, enabling continuous quality optimization throughout the generation process. Experimental results on multiple real and synthetic datasets demonstrate that RDDG outperforms existing approaches in both data fidelity and downstream imbalanced classification performance.

2025

Defense strategies of large language models besides alignment are introduced to defend against jailbreak attacks, and they have managed to decrease the success rate of jailbreak attacks. However, these defense strategies weakened the helpfulness of large language models. In this work, we propose a universal framework, LlmFixer, acting on large language models equipped with any defense strategy to recover their original helpfulness. LlmFixer consists of an input prompt re-writer and a logic patch. The prompt re-writer is a pre-model for clarifying the intention of input prompts, which promotes large language models to be more helpful to benign inputs and more rejective to malicious inputs. The logic patch is a lightweight structure that enhances large language models’ comprehension capacity by supplementing certain logical relationships. Without updating the parameters of a defensive large language model, LlmFixer fixes its helpfulness while preserving safety. Experiments on three large language models, five jailbreak attacks, and four defense strategies show the effectiveness of LlmFixer.

2024

Due to the explosion of various dialogue scenes, summarizing the dialogue into a short message has drawn much attention recently. In the multi-modal dialogue scene, people tend to use tone and body language to illustrate their intentions. While traditional dialogue summarization has predominantly focused on textual content, this approach may overlook vital visual and audio information essential for understanding multi-modal interactions. Recognizing the established field of multi-modal dialogue summarization, we develop a new multi-modal dialogue summarization dataset (MDS), which aims to enhance the variety and scope of data available for this research area. MDS provides a demanding testbed for multi-modal dialogue summarization. Subsequently, we conducted a comparative analysis of various summarization techniques on MDS and found that the existing methods tend to produce redundant and incoherent summaries. All of the models generate unfaithful facts to some degree, suggesting future research directions. MDS is available at https://github.com/R00kkie/MDS.