Desheng Wu

2026

Leveraging Human and Machine Preferences for Zero-shot Detection of AI-Generated Text
Lei Jiang | Desheng Wu | Xiaolong Zheng | Cuicui Luo
Findings of the Association for Computational Linguistics: ACL 2026

In recent years, the rapid advancement of large language models (LLMs) has enabled generated texts to closely mimic human writing, posing significant challenges to the detection of AI-generated content. Current mainstream zero-shot detection methods largely adopt a machine-centric perspective, relying on proxy models to compute token-level AI-likelihood scores and treating all tokens equally during overall detection. However, such approaches overlook the prediction discrepancies that arise when humans and large language models interpret the same text. We argue that tokens exhibiting greater divergence between human and machine predictions can provide stronger clues for determining the authorship of a text. To address this limitation, we propose HAPDA—a human-machine prediction discrepancy adapter for AI-generated text detection (AGTD). The framework consists of two core components: (1) a joint fine-tuning strategy for training paired human-preference and machine-preference models, and (2) a discrepancy-aware reweighting mechanism designed to calibrate token-level detection scores in downstream detectors. Extensive experiments demonstrate that HAPDA consistently and significantly enhances the detection performance of five representative baseline models under various evaluation scenarios.

pdf bib abs

Extending First-Order Logic for Factual Reasoning over Knowledge Graphs
Yuanzhen Hao | Desheng Wu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

First-order logic (FOL) is a fundamental formalism for factual reasoning over knowledge graphs (KGs), e.g. in researches of KG-based fact verification and logical consistency or reasoning of large language models (LLM). However, existing benchmarks and approaches insufficiently capture many claims that require comparison or counting, and lack support for several FOL quantifiers and connectives. To address these challenges and expand the expressive capacity of FOL for KG-based reasoning, we introduce FOLX-KG, a novel extended FOL 𝜎-structure over KGs that incorporates comparison predicates and counting quantifiers. Using this extended logic, we construct Fact-FOLX-KG, a fact verification dataset consisting of 43,821 KG-based claim–formula pairs designed to enable systematic study of richer logical forms and reasoning types. We further propose FOLX Prover, an executable program-guided logic reasoning pipeline adapted for KG-based factual reasoning under the extended FOL. Experimental results show that our method achieves state-of-the-art performance on Fact-FOLX-KG, while previous methods experience performance drop on claims requiring comparison and counting. These findings demonstrate the importance of extended logical expressiveness for robust factual reasoning over KGs.

2025

pdf bib abs

Fact Verification on Knowledge Graph via Programmatic Graph Reasoning
Yuanzhen Hao | Desheng Wu
Findings of the Association for Computational Linguistics: EMNLP 2025

Fact verification on knowledge graphs (KGs) uses the structured representation of entities and relations as evidence for validating claims. Previous methods for KG-based fact verification predominantly use natural language inference (NLI) models to predict entailment between claims and KG triples, based on implicit reasoning. We propose Programmatic Graph Reasoning (PGR), a novel framework that integrates large language models (LLMs) for fact verification on KGs. PGR explicitly encodes the reasoning process as a graph reasoning program composed of predefined functions to verify claims step by step. These functions are executed sequentially for graph reasoning and final result prediction. By making the graph reasoning process explicit, PGR ensures more precise and transparent reasoning steps compared to implicit methods. Experimental results on the FactKG dataset demonstrate that PGR achieves state-of-the-art performance with 86.82% accuracy, outperforming all the baseline models. Further analysis confirms the interpretability and effectiveness of our method in handling complex graph reasoning.

pdf bib abs

SenDetEX: Sentence-Level AI-Generated Text Detection for Human-AI Hybrid Content via Style and Context Fusion
Lei Jiang | Desheng Wu | Xiaolong Zheng
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Text generated by Large Language Models (LLMs) now rivals human writing, raising concerns about its misuse. However, mainstream AI-generated text detection (AGTD) methods primarily target document-level long texts and struggle to generalize effectively to sentence-level short texts. And current sentence-level AGTD (S-AGTD) research faces two significant limitations: (1) lack of a comprehensive evaluation on complex human-AI hybrid content, where human-written text (HWT) and AI-generated text (AGT) alternate irregularly, and (2) failure to incorporate contextual information, which serves as a crucial supplementary feature for identifying the origin of the detected sentence. Therefore, in our work, we propose AutoFill-Refine, a high-quality synthesis strategy for human-AI hybrid texts, and then construct a dedicated S-AGTD benchmark dataset. Besides, we introduce SenDetEX, a novel framework for sentence-level AI-generated text detection via style and context fusion. Extensive experiments demonstrate that SenDetEX significantly outperforms all baseline models in detection accuracy, while exhibiting remarkable transferability and robustness. Source code is available at https://github.com/TristoneJiang/SenDetEX.

Co-authors

Venues

Fix author