Shuo Xing
2025
Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization
Shuo Xing
|
Peiran Li
|
Yuping Wang
|
Ruizheng Bai
|
Yueqi Wang
|
Chan-Wei Hu
|
Chengxuan Qian
|
Huaxiu Yao
|
Zhengzhong Tu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
The emergence of large Vision Language Models (VLMs) has broadened the scope and capabilities of single-modal Large Language Models (LLMs) by integrating visual modalities, thereby unlocking transformative cross-modal applications in a variety of real-world scenarios. Despite their impressive performance, VLMs are prone to significant hallucinations, particularly in the form of cross-modal inconsistencies. Building on the success of Reinforcement Learning from Human Feedback (RLHF) in aligning LLMs, recent advancements have focused on applying direct preference optimization (DPO) on carefully curated datasets to mitigate these issues. Yet, such approaches typically introduce preference signals in a brute-force manner, neglecting the crucial role of visual information in the alignment process. In this paper, we introduce Re-Align, a novel alignment framework that leverages image retrieval to construct a dual-preference dataset, effectively incorporating both textual and visual preference signals. We further introduce rDPO, an extension of the standard direct preference optimization that incorporates an additional visual preference objective during fine-tuning. Our experimental results demonstrate that Re-Align not only mitigates hallucinations more effectively than previous methods but also yields significant performance gains in general visual question-answering (VQA) tasks. Moreover, we show that Re-Align maintains robustness and scalability across a wide range of VLM sizes and architectures. This work represents a significant step forward in aligning multimodal LLMs, paving the way for more reliable and effective cross-modal applications.
2024
Plum: Prompt Learning using Metaheuristics
Rui Pan
|
Shuo Xing
|
Shizhe Diao
|
Wenhe Sun
|
Xiang Liu
|
KaShun Shum
|
Jipeng Zhang
|
Renjie Pi
|
Tong Zhang
Findings of the Association for Computational Linguistics: ACL 2024
Since the emergence of large language models, prompt learning has become a popular method for optimizing and customizing these models. Special prompts, such as Chain-of-Thought, have even revealed previously unknown reasoning capabilities within these models. However, the progress of discovering effective prompts has been slow, driving a desire for general prompt optimization methods. Unfortunately, few existing prompt learning methods satisfy the criteria of being truly “general”, i.e., automatic, discrete, black-box, gradient-free, and interpretable all at once. In this paper, we introduce metaheuristics, a branch of discrete non-convex optimization methods with over 100 options, as a promising approach to prompt learning. Within our paradigm, we test six typical methods: hill climbing, simulated annealing, genetic algorithms with/without crossover, tabu search, and harmony search, demonstrating their effectiveness in white-box and black-box prompt learning. Furthermore, we show that these methods can be used to discover more human-understandable prompts that were previously unknown in both reasoning and image generation tasks, opening the door to a cornucopia of possibilities in prompt optimization.
Search
Fix author
Co-authors
- Ruizheng Bai 1
- Shizhe Diao 1
- Chan-Wei Hu 1
- Peiran Li 1
- Xiang Liu (刘祥, 刘响) 1
- show all...