Shiqi Wang
Papers on this page may belong to the following people: Shiqi Wang, Shiqi Wang
2025
daDPO: Distribution-Aware DPO for Distilling Conversational Abilities
Zhengze Zhang | Shiqi Wang | Yiqun Shen | Simin Guo | Dahua Lin | Xiaoliang Wang | Cam Tu Nguyen | Fei Tan
Findings of the Association for Computational Linguistics: ACL 2025
Zhengze Zhang | Shiqi Wang | Yiqun Shen | Simin Guo | Dahua Lin | Xiaoliang Wang | Cam Tu Nguyen | Fei Tan
Findings of the Association for Computational Linguistics: ACL 2025
Large language models (LLMs) have demonstrated exceptional performance across various applications, but their conversational abilities decline sharply as model size decreases, presenting a barrier to their deployment in resource-constrained environments. Knowledge distillation (KD) with Direct Preference Optimization (DPO) has emerged as a promising approach to enhance the conversational abilities of smaller models using a larger teacher model. However, current methods primarily focus on “black-box” KD, which only uses the teacher’s responses, overlooking the rich distributional information within the teacher’s probability distribution. This paper addresses this gap by introducing daDPO (Distillation-Aware DPO), a novel framework that integrates the teacher’s distributional information into DPO distillation while preserving theoretical guarantees. Our framework offers a unified objective that enhances both preference optimization and distribution-based distillation. We provide rigorous theoretical analysis and empirical validation, showing that daDPO outperforms existing methods in restoring performance for pruned models and enhancing smaller models within the same LLM family. Notably, in in-domain evaluation, our method enables a 20% pruned Vicuna1.5-7B to achieve near-teacher performance (-7.3% preference rate), and allows Qwen2.5-1.5B to occasionally outperform its 7b teacher model (14.0% win rate).
MicroEdit: Neuron-level Knowledge Disentanglement and Localization in Lifelong Model Editing
Shiqi Wang | Qi Wang | Runliang Niu | He Kong | Yi Chang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Shiqi Wang | Qi Wang | Runliang Niu | He Kong | Yi Chang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) require continual knowledge updates to keep pace with the evolving world. While various model editing methods have been proposed, most face critical challenges in the context of lifelong learning due to two fundamental limitations: (1) Edit Overshooting - parameter updates intended for a specific fact spill over to unrelated regions, causing interference with previously retained knowledge; and (2) Knowledge Entanglement - polysemantic neurons’ overlapping encoding of multiple concepts makes it difficult to isolate and edit a single fact. In this paper, we propose MicroEdit, a neuron-level editing method that performs minimal and controlled interventions within LLMs. By leveraging a sparse autoencoder (SAE), MicroEdit disentangles knowledge representations and activates only a minimal set of necessary neurons for precise parameter updates. This targeted design enables fine-grained control over the editing scope, effectively mitigating interference and preserving unrelated knowledge. Extensive experiments show that MicroEdit outperforms prior methods and robustly handles lifelong knowledge editing across QA and Hallucination settings on LLaM and Mistral.
Generating Commonsense Reasoning Questions with Controllable Complexity through Multi-step Structural Composition
Jianxing Yu | Shiqi Wang | Hanjiang Lai | Wenqing Chen | Yanghui Rao | Qinliang Su | Jian Yin
Proceedings of the 31st International Conference on Computational Linguistics
Jianxing Yu | Shiqi Wang | Hanjiang Lai | Wenqing Chen | Yanghui Rao | Qinliang Su | Jian Yin
Proceedings of the 31st International Conference on Computational Linguistics
This paper studies the task of generating commonsense reasoning questions (QG) with desired difficulty levels. Compared to traditional shallow questions that can be solved by simple term matching, ours are more challenging. Our answering process requires reasoning over multiple contextual and commonsense clues. That involves advanced comprehension skills, such as abstract semantics learning and missing knowledge inference. Existing work mostly learns to map the given text into questions, lacking a mechanism to control results with the desired complexity. To address this problem, we propose a novel controllable framework. We first derive contextual and commonsense clues involved in reasoning questions from the text. These clues are used to create simple sub-questions. We then aggregate multiple sub-questions to compose complex ones under the guidance of prior reasoning structures. By iterating this process, we can compose a complex QG task based on a series of smaller and simpler QG subtasks. Each subtask serves as a building block for a larger one. Each composition corresponds to an increase in the reasoning step. Moreover, we design a voting verifier to ensure results’ validity from multiple views, including answer consistency, reasoning difficulty, and context correlation. Finally, we can learn the optimal QG model to yield thought-provoking results. Evaluations on two typical datasets validate our method.
2024
ReTA: Recursively Thinking Ahead to Improve the Strategic Reasoning of Large Language Models
Jinhao Duan | Shiqi Wang | James Diffenderfer | Lichao Sun | Tianlong Chen | Bhavya Kailkhura | Kaidi Xu
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Jinhao Duan | Shiqi Wang | James Diffenderfer | Lichao Sun | Tianlong Chen | Bhavya Kailkhura | Kaidi Xu
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Current logical reasoning evaluations of Large Language Models (LLMs) primarily focus on single-turn and static environments, such as arithmetic problems. The crucial problem of multi-turn, strategic reasoning is under-explored. In this work, we analyze the multi-turn strategic reasoning of LLMs through text-driven complete- and incomplete-information gaming, e.g., board games (Tic-Tac-Toe, Connect-4) and poker games (Texas Hold’em Poker). Specifically, we consider two distinct scenarios: 1) Online Racing, featuring multiple LLMs/agents to facilitate direct competition and comparison; 2) Offline Probing, constructing targeted questions with verified ground truth to evaluate LLMs’ strategic behaviors. Experimental results demonstrate that existing state-of-the-art LLMs and reasoning schemes are largely ineffective for strategic reasoning tasks. To mitigate these limitations, we propose a simple yet effective Recursively Thinking-Ahead (ReTA) agent, incorporating a recursive prompting mechanism that automatically analyzes the opponents’ future moves/actions and assigns reward signals for these situations, to strengthen the strategic reasoning of LLMs. We hope our work could spur further research and exploration in the multi-turn strategic reasoning of LLMs. The code is available at https://github.com/jinhaoduan/ReTA.
Reward Difference Optimization For Sample Reweighting In Offline RLHF
Shiqi Wang | Zhengze Zhang | Rui Zhao | Fei Tan | Cam Tu Nguyen
Findings of the Association for Computational Linguistics: EMNLP 2024
Shiqi Wang | Zhengze Zhang | Rui Zhao | Fei Tan | Cam Tu Nguyen
Findings of the Association for Computational Linguistics: EMNLP 2024
With the wide deployment of Large Language Models (LLMs), aligning LLMs with human values becomes increasingly important. Although Reinforcement Learning with Human Feedback (RLHF) proves effective, it is complicated and highly resource-intensive. As such, offline RLHF has been introduced as an alternative solution, which directly optimizes LLMs with ranking losses on a fixed preference dataset. Current offline RLHF only captures the ordering relationship between responses, overlooking the crucial aspect of “how much” one is preferred over the others. To address this issue, we propose a simple yet effective solution based on reward difference prediction. Specifically, we introduce reward difference coefficients to reweigh sample pairs in offline RLHF. We then propose a difference model that considers rich interactions between a pair of responses for predicting these difference coefficients. Experiments with 7B LLMs on the HH and TL;DR dataset verify the effectiveness of our method in both automatic metrics and human evaluation, highlighting its potential for aligning LLMs with human values.
Multimodal Clickbait Detection by De-confounding Biases Using Causal Representation Inference
Jianxing Yu | Shiqi Wang | Han Yin | Zhenlong Sun | Ruobing Xie | Bo Zhang | Yanghui Rao
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Jianxing Yu | Shiqi Wang | Han Yin | Zhenlong Sun | Ruobing Xie | Bo Zhang | Yanghui Rao
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
This paper focuses on detecting clickbait posts on the Web. These posts often use eye-catching disinformation in mixed modalities to mislead users to click for profit. That affects the user experience and thus would be blocked by content provider. To escape detection, malicious creators use tricks to add some irrelevant non-bait content into bait posts, dressing them up as legal to fool the detector. This content often has biased relations with non-bait labels, yet traditional detectors tend to make predictions based on simple co-occurrence rather than grasping inherent factors that lead to malicious behavior. This spurious bias would easily cause misjudgments. To address this problem, we propose a new debiased method based on causal inference. We first employ a set of features in multiple modalities to characterize the posts. Considering these features are often mixed up with unknown biases, we then disentangle three kinds of latent factors from them, including the invariant factor that indicates intrinsic bait intention; the causal factor which reflects deceptive patterns in a certain scenario, and non-causal noise. By eliminating the noise that causes bias, we can use invariant and causal factors to build a robust model with good generalization ability. Experiments on three popular datasets show the effectiveness of our approach.
Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models
Jinhao Duan | Hao Cheng | Shiqi Wang | Alex Zavalny | Chenan Wang | Renjing Xu | Bhavya Kailkhura | Kaidi Xu
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jinhao Duan | Hao Cheng | Shiqi Wang | Alex Zavalny | Chenan Wang | Renjing Xu | Bhavya Kailkhura | Kaidi Xu
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) show promising results in language generation and instruction following but frequently “hallucinate”, making their outputs less reliable. Despite Uncertainty Quantification’s (UQ) potential solutions, implementing it accurately within LLMs is challenging. Our research introduces a simple heuristic: not all tokens in auto-regressive LLM text equally represent the underlying meaning, as “linguistic redundancy” often allows a few keywords to convey the essence of long sentences. However, current methods underestimate this inequality when assessing uncertainty, causing tokens with limited semantics to be equally or excessively weighted in UQ. To correct this, we propose Shifting Attention to more Relevant (SAR) components at both token- and sentence-levels for better UQ. We conduct extensive experiments involving a range of popular “off-the-shelf” LLMs, such as Vicuna, WizardLM, and LLaMA-2-chat, with model sizes extending up to 33B parameters. We evaluate various free-form question-answering tasks, encompassing domains such as reading comprehension, science Q&A, and medical Q&A. Our experimental results, coupled with a comprehensive demographic analysis, demonstrate the superior performance of SAR. The code is available at https://github.com/jinhaoduan/SAR.
2023
Generating Deep Questions with Commonsense Reasoning Ability from the Text by Disentangled Adversarial Inference
Jianxing Yu | Shiqi Wang | Libin Zheng | Qinliang Su | Wei Liu | Baoquan Zhao | Jian Yin
Findings of the Association for Computational Linguistics: ACL 2023
Jianxing Yu | Shiqi Wang | Libin Zheng | Qinliang Su | Wei Liu | Baoquan Zhao | Jian Yin
Findings of the Association for Computational Linguistics: ACL 2023
This paper proposes a new task of commonsense question generation, which aims to yield deep-level and to-the-point questions from the text. Their answers need to reason over disjoint relevant contexts and external commonsense knowledge, such as encyclopedic facts and causality. The knowledge may not be explicitly mentioned in the text but is used by most humans for problem-shooting. Such complex reasoning with hidden contexts involves deep semantic understanding. Thus, this task has great application value, such as making high-quality quizzes in advanced exams. Due to the lack of modeling complexity, existing methods may produce shallow questions that can be answered by simple word matching. To address these challenges, we propose a new QG model by simultaneously considering asking contents, expressive ways, and answering complexity. We first retrieve text-related commonsense context. Then we disentangle the key factors that control questions in terms of reasoning content and verbalized way. Independence priors and constraints are imposed to facilitate disentanglement. We further develop a discriminator to promote the deep results by considering their answering complexity. Through adversarial inference, we learn the latent factors from data. By sampling the expressive factor from the data distributions, diverse questions can be yielded. Evaluations of two typical data sets show the effectiveness of our approach.
Search
Fix author
Co-authors
- Jianxing Yu 3
- Jinhao Duan 2
- Bhavya Kailkhura 2
- Cam-Tu Nguyen 2
- Yanghui Rao 2
- Qinliang Su 2
- Fei Tan 2
- Kaidi Xu 2
- Jian Yin 2
- Zhengze Zhang 2
- Yi Chang 1
- Tianlong Chen 1
- Wenqing Chen 1
- Hao Cheng 1
- James Diffenderfer 1
- Simin Guo 1
- He Kong 1
- Hanjiang Lai 1
- Dahua Lin 1
- Wei Liu 1
- Runliang Niu 1
- Yiqun Shen 1
- Lichao Sun 1
- Zhenlong Sun 1
- Chenan Wang 1
- Qi Wang 1
- Xiaoliang Wang 1
- Ruobing Xie 1
- Renjing Xu 1
- Han Yin 1
- Alex Zavalny 1
- Bo Zhang 1
- Baoquan Zhao 1
- Rui Zhao 1
- Libin Zheng 1