Zelong Yu
2026
Self-Reinforcing Controllable Synthesis of Rare Relational Data via Bayesian Calibration
Chongsheng Zhang | Hao Wang | Zelong Yu | Esteban Garces Arias | Julian Rodemann | Zhanshuo Zhang | Qilong Li | Gaojuan Fan | Krikamol Muandet | Christian Heumann
Findings of the Association for Computational Linguistics: ACL 2026
Chongsheng Zhang | Hao Wang | Zelong Yu | Esteban Garces Arias | Julian Rodemann | Zhanshuo Zhang | Qilong Li | Gaojuan Fan | Krikamol Muandet | Christian Heumann
Findings of the Association for Computational Linguistics: ACL 2026
Imbalanced data are commonly present in real-world applications. While data synthesis can effectively mitigate data scarcity for rare classes, and LLMs have revolutionized text generation, the application of LLMs to the synthesis of relational/structured tabular data remains underexplored. Moreover, existing approaches lack an effective feedback mechanism to guide LLMs in continuously optimizing the quality of the generated data throughout the synthesis process. In this work, we propose RDDG, Relational Data generator with Dynamic Guidance, which is a unified in-context learning framework that employs progressive chain-of-thought (CoT) steps to generate tabular data for enhancing downstream imbalanced classification performance. RDDG first uses core set selection to identify representative samples from the original data, then utilizes in-context learning to discover the inherent patterns and correlations among attributes within the core set, and subsequently generates tabular data while preserving the aforementioned constraints. More importantly, it incorporates a self-reinforcing feedback mechanism that provides automatic assessments of the quality of the generated data, enabling continuous quality optimization throughout the generation process. Experimental results on multiple real and synthetic datasets demonstrate that RDDG outperforms existing approaches in both data fidelity and downstream imbalanced classification performance.
2025
LlmFixer: Fix the Helpfulness of Defensive Large Language Models
Zelong Yu | Xiaoming Zhang | Litian Zhang | Yu Yuan | Chaozhuo Li
Findings of the Association for Computational Linguistics: EMNLP 2025
Zelong Yu | Xiaoming Zhang | Litian Zhang | Yu Yuan | Chaozhuo Li
Findings of the Association for Computational Linguistics: EMNLP 2025
Defense strategies of large language models besides alignment are introduced to defend against jailbreak attacks, and they have managed to decrease the success rate of jailbreak attacks. However, these defense strategies weakened the helpfulness of large language models. In this work, we propose a universal framework, LlmFixer, acting on large language models equipped with any defense strategy to recover their original helpfulness. LlmFixer consists of an input prompt re-writer and a logic patch. The prompt re-writer is a pre-model for clarifying the intention of input prompts, which promotes large language models to be more helpful to benign inputs and more rejective to malicious inputs. The logic patch is a lightweight structure that enhances large language models’ comprehension capacity by supplementing certain logical relationships. Without updating the parameters of a defensive large language model, LlmFixer fixes its helpfulness while preserving safety. Experiments on three large language models, five jailbreak attacks, and four defense strategies show the effectiveness of LlmFixer.
2024
MDS: A Fine-Grained Dataset for Multi-Modal Dialogue Summarization
Zhipeng Liu | Xiaoming Zhang | Litian Zhang | Zelong Yu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Zhipeng Liu | Xiaoming Zhang | Litian Zhang | Zelong Yu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Due to the explosion of various dialogue scenes, summarizing the dialogue into a short message has drawn much attention recently. In the multi-modal dialogue scene, people tend to use tone and body language to illustrate their intentions. While traditional dialogue summarization has predominantly focused on textual content, this approach may overlook vital visual and audio information essential for understanding multi-modal interactions. Recognizing the established field of multi-modal dialogue summarization, we develop a new multi-modal dialogue summarization dataset (MDS), which aims to enhance the variety and scope of data available for this research area. MDS provides a demanding testbed for multi-modal dialogue summarization. Subsequently, we conducted a comparative analysis of various summarization techniques on MDS and found that the existing methods tend to produce redundant and incoherent summaries. All of the models generate unfaithful facts to some degree, suggesting future research directions. MDS is available at https://github.com/R00kkie/MDS.