Zheng Wei

2025

Direct Preference Optimization (DPO) has recently emerged as an efficient and effective method for aligning large language models with human preferences. However, constructing high-quality preference datasets remains challenging, often necessitating expensive manual or powerful LM annotations. Additionally, standard DPO exhibits suboptimal performance in complex reasoning tasks, such as mathematical and code reasoning. In this paper, we introduce an approach to collect preference pairs through iterative sampling and execution feedback, tailored to the current learning state (e.g. well-learned, mis-learned, and unlearned) of the policy model. To alleviate the failures of DPO and improve its applicability in reasoning tasks, we propose , an iterative uncertainty-aware preference optimization method that achieves fine-grained preference control by assessing model confidence. We validate our approach across three reasoning tasks, incorporating five established reasoning datasets and one self-curated dataset. Our experimental results demonstrate an overall improvement of 3.6% over the standard DPO method and show the model exhibits promising generalizability.

SQL languages often feature nested structures that require robust interaction with databases. Aside from the well-validated schema linking methods on PLMs and LLMs, we introduce the Linearly Incremental SQL Translator (LIST), a novel algorithmic toolkit designed to leverage the notable reasoning and tool interaction capabilities inherent in LLMs. LIST transforms complex SQL queries into grammatically verifiable sub-queries which are arranged sequentially to reflect single-hop reasoning steps, enhancing both the granularity and accuracy of database interactions. With in-context learning, our experiments demonstrated significant improvements, achieving notable performance of 60.56% and 56.32% on the BIRD dataset with GPT-4o and Llama-3-70B-Instruct. To the best of our knowledge, this achieves SOTA performance among non-schema linking methods, also surpassing a series of schema linking based approaches at a comparable or better cost.

2024

pdf bib abs
WRP: Weight Recover Prune for Structured Sparsity
Zhendong Tan | Xingjun Zhang | Zheng Wei
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

As the scale of Large Language Models (LLMs) increases, it is necessary to compress the models to reduce the substantial demand on computational resources. Network pruning significantly reduces the model size by converting the weight matrix from dense to sparse data format. Current methodologies advocate for one-shot pruning to avoid the expense of retraining, ensuring the maintenance of model performance under conditions of 50%-60% unstructured pruning. Nevertheless, matrices characterized by this level of sparsity could not be treated as sparse matrices, because the indices would incur significant costs. To mitigate this problem, NVIDIA introduced the 2:4 structured sparsity. However, we observe a notable decline in model performance when adopting 2:4 structured sparsity due to group constraints. In this paper, we introduce the Weight Recover Prune (WRP) approach. By recovering a minimal set of critical weights, WRP aims to enhance model performance while maintaining the efficiency of the compression. Our evaluation of the WRP method on the LLAMA2 and OPT models shows that it outperforms other 2:4 pattern one-shot pruning methods. Meanwhile, WRP can guarantee that the size of the pruned model is about 60% of the dense model. Our code is available at: https://github.com/TanZhendong/WRP.

Co-authors

Lei Li 1

Yi Yuan 1

Venues

Fix author