Xiangqing Shen

2026

Rerankers play a pivotal role in refining retrieval results for Retrieval-Augmented Generation. However, current reranking models are typically optimized on static human annotated relevance labels in isolation, decoupled from the downstream generation process. This isolation leads to a fundamental misalignment: documents identified as topically relevant by information retrieval metrics often fail to provide the actual utility required by the LLM for precise answer generation. To bridge this gap, we introduce ReRanking Preference Optimization (RRPO), a reinforcement learning framework that directly aligns reranking with the LLM’s generation quality. By formulating reranking as a sequential decision-making process, RRPO optimizes for context utility using LLM feedback, thereby eliminating the need for expensive human annotations. To ensure training stability, we further introduce a reference-anchored deterministic baseline. Extensive experiments on knowledge-intensive benchmarks demonstrate that RRPO significantly outperforms strong baselines, including the powerful list-wise reranker RankZephyr. Further analysis highlights the versatility of our framework: it generalizes seamlessly to diverse readers (e.g., GPT-4o), integrates orthogonally with query expansion modules like Query2Doc, and remains robust even when trained with noisy supervisors.

pdf bib abs

Interactive Semantic Parsing with Reinforcement Learning for Knowledge Graph Reasoning
Yurun Song | Xiangqing Shen | Jianfei Yu | Rui Xia
Findings of the Association for Computational Linguistics: ACL 2026

While large language models (LLMs) have achieved remarkable success, their reliability in knowledge-intensive tasks is often compromised by factual hallucinations. Integrating Knowledge Graphs (KGs) addresses this issue; however, existing approaches typically rely on simple graph traversal.This paradigm decouples topological navigation from logical operations (e.g., temporal filtering, aggregation), leading to imprecise retrieval and heavy post-processing burdens.Although semantic parsing offers a solution by grounding reasoning in logical forms, it traditionally suffers from a dependency on scarce supervised annotations.To bridge this gap, we propose Interactive Semantic Parsing, a framework that formulates reasoning as the sequential generation of executable logical clauses. This design allows logical constraints to be dynamically interleaved with graph search, while optimizing via reinforcement learning with only final answer feedback eliminates the need for gold program annotations.To tackle the sparse reward challenge in the vast symbolic space, we introduce a distance-aware process reward to evaluate intermediate steps based on their topological proximity to the answer.Experimental results on WebQSP and CWQ demonstrate that our method achieves state-of-the-art performance, particularly on complex queries, validating the effectiveness of our dense reward signal in enabling robust reasoning without supervision.Our code is available at https://github.com/NUSTM/ISP-KGR.

pdf bib abs

LoReFact: Bridging the Logic Gap in Fact-Checking
Qiming Xie | Wenjie Zheng | Xiangqing Shen | Rui Xia
Findings of the Association for Computational Linguistics: ACL 2026

The rise of social media and generative AI has led to a surge of misinformation online, making reliable fact-checking increasingly critical.Most existing fact-checking research adheres to the decompose-then-verify paradigm, emphasizing verification of individual facts while overlooking the validity of logical dependencies among them. As a result, text containing logical errors may still be misjudged as factual. Moreover, existing datasets and metrics focus on fact completeness and coverage, failing to capture the logical dimension.To help bridge this gap, we propose a content–logic coupled factuality evaluation paradigm, which conceptualizes factuality along two complementary dimensions: content factuality and logic factuality. Under this paradigm, we introduce a holistic solution consisting of LoReFact, the first long-form fact-checking dataset that systematically incorporates the logical dimension; LoRe-Factcheck, a simple yet effective framework for joint content–logic evaluation; and a logic-aware metric named LoReFactScore for exposing and penalizing logical fallacies.Experiments demonstrate the importance of logical factuality and the effectiveness of our proposed paradigm for fact-checking.[Our data and code are publicly available at https://github.com/NUSTM/LoReFact]

2025

pdf bib abs

VCD: A Dataset for Visual Commonsense Discovery in Images
Xiangqing Shen | Fanfan Wang | Siwei Wu | Rui Xia
Findings of the Association for Computational Linguistics: ACL 2025

Visual commonsense plays a vital role in understanding and reasoning about the visual world. While commonsense knowledge bases like ConceptNet provide structured collections of general facts, they lack visually grounded representations. Scene graph datasets like Visual Genome, though rich in object-level descriptions, primarily focus on directly observable information and lack systematic categorization of commonsense knowledge. We present Visual Commonsense Dataset (VCD), a large-scale dataset containing over 100,000 images and 14 million object-commonsense pairs that bridges this gap. VCD introduces a novel three-level taxonomy for visual commonsense, integrating both Seen (directly observable) and Unseen (inferrable) commonsense across Property, Action, and Space aspects. Each commonsense is represented as a triple where the head entity is grounded to object bounding boxes in images, enabling scene-dependent and object-specific visual commonsense representation. To demonstrate VCD’s utility, we develop VCM, a generative model that combines a vision-language model with instruction tuning to discover diverse visual commonsense from images. Extensive evaluations demonstrate both the high quality of VCD and its value as a resource for advancing visually grounded commonsense understanding and reasoning. Our dataset and code will be released on https://github.com/NUSTM/VCD.

pdf bib abs

Flexible Thinking for Multimodal Emotional Support Conversation via Reinforcement Learning
Fanfan Wang | Xiangqing Shen | Jianfei Yu | Rui Xia
Findings of the Association for Computational Linguistics: EMNLP 2025

Emotional Support Conversation (ESC) systems aim to alleviate user distress. However, current Chain-of-Thought based ESC methods often employ rigid, text-only reasoning, limiting adaptability in dynamic, multimodal interactions and introducing reasoning noise that degrades support quality. To address this, we introduce “Flexible Thinking” for multimodal ESC, enabling models to adaptively select contextually relevant thinking aspects: Visual Scene, Emotion, Situation, and Response Strategy. We first construct training data by manually curating flexible thinking demonstrations on the MESC dataset, then using a Multimodal Large Language Model to synthesize these processes for the full training set. Then, we propose FIRES, a framework integrating Supervised Fine-Tuning (SFT) for initial learning with Reinforcement Learning for refinement. This two-stage approach helps FIRES transcend SFT’s generalization limits and, crucially, directly links thinking processes to response quality via tailored rewards, moving beyond imitating potentially imperfect synthetic data. Experiments on MESC and EMOTyDA datasets demonstrate FIRES’s effectiveness and generalizability in fostering higher-quality emotional support responses through adaptive reasoning.

pdf bib abs

MEMIT-Merge: Addressing MEMIT’s Key-Value Conflicts in Same-Subject Batch Editing for LLMs
Zilu Dong | Xiangqing Shen | Rui Xia
Findings of the Association for Computational Linguistics: ACL 2025

As large language models (LLMs) continue to scale up, knowledge editing techniques that modify models’ internal knowledge without full retraining have gained significant attention. MEMIT, a prominent batch editing algorithm, stands out for its capability to perform mass knowledge modifications. However, we uncovers a critical limitation that MEMIT’s editing efficacy significantly deteriorates when processing batches containing multiple edits sharing the same subject. Our analysis reveals the root cause lies in MEMIT’s key-value modeling framework: when multiple facts with the same subject in a batch are modeled through MEMIT’s key-value mechanism, identical keys (derived from the shared subject) are forced to represent different values (corresponding to distinct knowledge), resulting in update conflicts during editing. Addressing this issue, we propose MEMIT-Merge, an enhanced approach that merges value computation processes for facts sharing the same subject, effectively resolving the performance degradation in same-subject batch editing scenarios. Experimental results demonstrate that at a batch size of 5, while the original MEMIT’s success rate drops to 46%, MEMIT-Merge maintains a 98% editing success rate, showcasing remarkable robustness to subject entity collisions.

pdf bib abs

ChainEdit: Propagating Ripple Effects in LLM Knowledge Editing through Logical Rule-Guided Chains
Zilu Dong | Xiangqing Shen | Zinong Yang | Rui Xia
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Current knowledge editing methods for large language models (LLMs) struggle to maintain logical consistency when propagating ripple effects to associated facts. We propose ChainEdit, a framework that synergizes knowledge graph-derived logical rules with LLM logical reasoning capabilities to enable systematic chain updates. By automatically extracting logical patterns from structured knowledge bases and aligning them with LLMs’ internal logics, ChainEdit dynamically generates and edits logically connected knowledge clusters. Experiments demonstrate an improvement of more than 30% in logical generalization over baselines while preserving editing reliability and specificity. We further address evaluation biases in existing benchmarks through knowledge-aware protocols that disentangle external dependencies. This work establishes new state-of-the-art performance on ripple effect while ensuring internal logical consistency after knowledge editing.

pdf bib abs

From Phrases to Subgraphs: Fine-Grained Semantic Parsing for Knowledge Graph Question Answering
Yurun Song | Xiangqing Shen | Rui Xia
Findings of the Association for Computational Linguistics: ACL 2025

The recent emergence of large language models (LLMs) has brought new opportunities to knowledge graph question answering (KGQA), but also introduces challenges such as semantic misalignment and reasoning noise. Semantic parsing (SP), previously a mainstream approach for KGQA, enables precise graph pattern matching by mapping natural language queries to executable logical forms. However, it faces limitations in scalability and generalization, especially when dealing with complex, multi-hop reasoning tasks.In this work, we propose a Fine-Grained Semantic Parsing (FGSP) framework for KGQA. Our framework constructs a fine-grained mapping library via phrase-level segmentation of historical question-logical form pairs, and performs online retrieval and fusion of relevant subgraph fragments to answer complex queries. This fine-grained, compositional approach ensures tighter semantic alignment between questions and knowledge graph structures, enhancing both interpretability and adaptability to diverse query types. Experimental results on two KGQA benchmarks demonstrate the effectiveness of FGSP, with a notable 18.5% relative F1 performance improvement over the SOTA on the complex multi-hop CWQ dataset. Our code is available at https://github.com/NUSTM/From-Phrases-to-Subgraphs.

2023

pdf bib abs

Dense-ATOMIC: Towards Densely-connected ATOMIC with High Knowledge Coverage and Massive Multi-hop Paths
Xiangqing Shen | Siwei Wu | Rui Xia
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

ATOMIC is a large-scale commonsense knowledge graph (CSKG) containing everyday if-then knowledge triplets, i.e., head event, relation, tail event. The one-hop annotation manner made ATOMIC a set of independent bipartite graphs, which ignored the numerous links between events in different bipartite graphs and consequently caused shortages in knowledge coverage and multi-hop paths. In this work, we aim to construct Dense-ATOMIC with high knowledge coverage and massive multi-hop paths. The events in ATOMIC are normalized to a consistent pattern at first. We then propose a CSKG completion method called Rel-CSKGC to predict the relation given the head event and the tail event of a triplet, and train a CSKG completion model based on existing triplets in ATOMIC. We finally utilize the model to complete the missing links in ATOMIC and accordingly construct Dense-ATOMIC. Both automatic and human evaluation on an annotated subgraph of ATOMIC demonstrate the advantage of Rel-CSKGC over strong baselines. We further conduct extensive evaluations on Dense-ATOMIC in terms of statistics, human evaluation, and simple downstream tasks, all proving Dense-ATOMIC’s advantages in Knowledge Coverage and Multi-hop Paths. Both the source code of Rel-CSKGC and Dense-ATOMIC are publicly available on https://github.com/NUSTM/Dense-ATOMIC.

pdf bib abs

Commonsense Knowledge Graph Completion Via Contrastive Pretraining and Node Clustering
Siwei Wu | Xiangqing Shen | Rui Xia
Findings of the Association for Computational Linguistics: ACL 2023

The nodes in the commonsense knowledge graph (CSKG) are normally represented by free-form short text (e.g., word or phrase). Different nodes may represent the same concept. This leads to the problems of edge sparsity and node redundancy, which challenges CSKG representation and completion. On the one hand, edge sparsity limits the performance of graph representation learning; On the other hand, node redundancy makes different nodes corresponding to the same concept have inconsistent relations with other nodes. To address the two problems, we propose a new CSKG completion framework based on Contrastive Pretraining and Node Clustering (CPNC). Contrastive Pretraining constructs positive and negative head-tail node pairs on CSKG and utilizes contrastive learning to obtain better semantic node representation. Node Clustering aggregates nodes with the same concept into a latent concept, assisting the task of CSKG completion. We evaluate our CPNC approach on two CSKG completion benchmarks (CN-100K and ATOMIC), where CPNC outperforms the state-of-the-art methods. Extensive experiments demonstrate that both Contrastive Pretraining and Node Clustering can significantly improve the performance of CSKG completion. The source code of CPNC is publicly available on https://github.com/NUSTM/CPNC.

Co-authors

Zhen Wu 1

Venues

Findings7
ACL3

Fix author