Jihao Gu

2025

Recent advancements in Direct Preference Optimization (DPO) have significantly enhanced the alignment of Large Language Models (LLMs) with human preferences, owing to its simplicity and effectiveness. However, existing methods typically optimize a scalar score or ranking reward, thereby overlooking the multi-dimensional nature of human preferences. In this work, we propose to extend the preference of DPO to two dimensions: segments and aspects. We first introduce a 2D supervision dataset called HelpSteer-2D. For the segment dimension, we divide the response into sentences and assign scores to each segment. For the aspect dimension, we meticulously design several criteria covering the response quality rubrics. With the 2-dimensional signals as feedback, we develop a 2D-DPO framework, decomposing the overall objective into multi-segment and multi-aspect objectives. Extensive experiments on popular benchmarks demonstrate that 2D-DPO performs better than methods that optimize for scalar or 1-dimensional preferences.

The evaluation of factual accuracy in large vision language models (LVLMs) has lagged behind their rapid development, making it challenging to fully reflect these models’ knowledge capacity and reliability. In this paper, we introduce the first factuality-based visual question-answering benchmark in Chinese, named ChineseSimpleVQA, aimed at assessing the visual factuality of LVLMs across 8 major topics and 56 subtopics. The key features of this benchmark include a focus on the Chinese language, diverse knowledge types, a multi-hop question construction, high-quality data, static consistency, and easy-to-evaluate through short answers. Moreover, we contribute a rigorous data construction pipeline and decouple the visual factuality into two parts: seeing the world (i.e., object recognition) and discovering knowledge. This decoupling allows us to analyze the capability boundaries and execution mechanisms of LVLMs. Subsequently, we evaluate 34 advanced open-source and closed-source models, revealing critical performance gaps within this field.

Multimodal Large Language Models (MLLMs) pose unique safety challenges due to their integration of visual and textual data, thereby introducing new dimensions of potential attacks and complex risk combinations. In this paper, we begin with a detailed analysis aimed at disentangling risks through step-by-step reasoning within multimodal inputs. We find that systematic multimodal risk disentanglement substantially enhances the risk awareness of MLLMs. Via leveraging the strong discriminative abilities of multimodal risk disentanglement, we further introduce DREAM ( Disentangling Risks to Enhance Safety Alignment in MLLMs), a novel approach that enhances safety alignment in MLLMs through supervised fine-tuning and iterative Reinforcement Learning from AI Feedback (RLAIF). Experimental results show that DREAM significantly boosts safety during both inference and training phases without compromising performance on normal tasks (namely oversafety), achieving a 16.17% improvement in the SIUO safe&effective score compared to GPT-4V.

2024

pdf bib abs
From Bottom to Top: Extending the Potential of Parameter Efficient Fine-Tuning
Jihao Gu | Zelin Wang | Yibo Zhang | Ziji Zhang | Ping Gong
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

With the proliferation of large language models, Parameter Efficient Fine-Tuning (PEFT) method, which freeze pre-trained parameters and only fine-tune a few task-specific parameters, are playing an increasingly important role. However, previous work primarily applied uniform operations across all layers of the model, overlooking the fact that different layers in a transformer store different information. In the process of exploration, We find that there is a significant differences in fine-tuning strategies between different layers, and fine-tuning only a subset of layers can even achieve comparable performance. Based on this, we propose the Hybrid LoRA-Prefix Tuning(HLPT) method, which uses enhanced LoRA and Prefix-tuning methods with learnable adaptive mechanism separately for the bottom and top layers, and the Half Hybrid LoRA-Prefix Tuning(H²LPT) method, which goes a step further, reducing the parameter count to nearly half by omitting fine-tuning in the middle layers. Extensive experiments with large language models on various downstream tasks provide strong evidence for the potential of PEFT focusing on different layers’ interactions and the effectiveness of our methods. Furthermore, we validate the robustness of these methods and their advantages in speeding up training convergence, reducing inference time requirements.