Li Guo
2026
Rethinking LLM Watermark Detection in Black-Box Settings: A Non-Intrusive Third-Party Framework
Zhuoshang Wang | Yubing Ren | Yanan Cao | Fang Fang | Xiaoxue Li | Li Guo
Findings of the Association for Computational Linguistics: ACL 2026
Zhuoshang Wang | Yubing Ren | Yanan Cao | Fang Fang | Xiaoxue Li | Li Guo
Findings of the Association for Computational Linguistics: ACL 2026
While watermarking serves as a critical mechanism for LLM provenance, existing secret-key schemes tightly couple detection with injection, requiring access to keys or provider-side scheme-specific detectors for verification. This dependency creates a fundamental barrier for real-world governance, as independent auditing becomes impossible without compromising model security or relying on the opaque claims of service providers. To resolve this dilemma, we introduce TTP-Detect, a pioneering black-box framework designed for non-intrusive, third-party watermark verification. By decoupling detection from injection, TTP-Detect reframes verification as a relative hypothesis testing problem. It employs a proxy model to amplify watermark-relevant signals and a suite of complementary relative measurements to assess the alignment of the query text with watermarked distributions. Extensive experiments across representative watermarking schemes, datasets and models demonstrate that TTP-Detect achieves superior detection performance and robustness against diverse attacks.
DualGuard: Dual-stream Large Language Model Watermarking Defense against Paraphrase and Spoofing Attack
Hao Li | Yubing Ren | Yanan Cao | Yingjie Li | Fang Fang | Shi Wang | Li Guo
Findings of the Association for Computational Linguistics: ACL 2026
Hao Li | Yubing Ren | Yanan Cao | Yingjie Li | Fang Fang | Shi Wang | Li Guo
Findings of the Association for Computational Linguistics: ACL 2026
With the rapid development of cloud-based services, large language models have become increasingly accessible through various web platforms. However, this accessibility has also led to growing risks of model abuse. LLM watermarking has emerged as an effective approach to mitigate such misuse and protect intellectual property. Existing watermarking algorithms, however, primarily focus on defending against paraphrase attacks while overlooking piggyback spoofing attacks, which can inject harmful content, compromise watermark reliability, and undermine trust in attribution. To address this limitation, we propose DualGuard, the first watermarking algorithm capable of defending against both paraphrase and spoofing attacks. DualGuard employs the adaptive dual-stream watermarking mechanism, in which two complementary watermark signals are dynamically injected based on the semantic content. This design enables DualGuard not only to detect but also to trace spoofing attacks, thereby ensuring reliable and trustworthy watermark detection. Extensive experiments conducted across multiple datasets and language models demonstrate that DualGuard achieves excellent detectability, robustness, traceability, and text quality, effectively advancing the state of LLM watermarking for real-world applications.
Exons-Detect: Identifying and Amplifying Exonic Tokens via Hidden-State Discrepancy for Robust AI-Generated Text Detection
Xiaowei Zhu | Yubing Ren | Fang Fang | Shi Wang | Yanan Cao | Li Guo
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xiaowei Zhu | Yubing Ren | Fang Fang | Shi Wang | Yanan Cao | Li Guo
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The rapid advancement of large language models has increasingly blurred the boundary between human-written and AI-generated text, raising societal risks such as misinformation dissemination, authorship ambiguity, and threats to intellectual property rights. These concerns highlight the urgent need for effective and reliable detection methods. While existing training-free approaches often achieve strong performance by aggregating token-level signals into a global score, they typically assume uniform token contributions, making them less robust under short sequences or localized token modifications. To address these limitations, we propose Exons-Detect, a training-free method for AI-generated text detection based on an exon-aware token reweighting perspective. Exons-Detect identifies and amplifies informative exonic tokens by measuring hidden-state discrepancy under a dual-model setting, and computes an interpretable translation score from the resulting importance-weighted token sequence. Empirical evaluations demonstrate that Exons-Detect achieves state-of-the-art detection performance and exhibits strong robustness to adversarial attacks and varying input lengths. In particular, it attains a 2.2% relative improvement in average AUROC over the strongest prior baseline on DetectRL.
HyperMem: Hypergraph Memory for Long-Term Conversations
Juwei Yue | Chuanrui Hu | Jiawei Sheng | Zuyi Zhou | Wenyuan Zhang | Tingwen Liu | Li Guo | Yafeng Deng
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Juwei Yue | Chuanrui Hu | Jiawei Sheng | Zuyi Zhou | Wenyuan Zhang | Tingwen Liu | Li Guo | Yafeng Deng
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Long-term memory is essential for conversational agents to maintain coherence, track persistent tasks, and provide personalized interactions across extended dialogues. However, existing approaches as Retrieval-Augmented Generation (RAG) and graph-based memory mostly rely on pairwise relations, which can hardly capture high-order associations, i.e., joint dependencies among multiple elements, causing fragmented retrieval. To this end, we propose HyperMem, a hypergraph-based hierarchical memory architecture that explicitly models such associations using hyperedges. Particularly, HyperMem structures memory into three levels: topics, episodes, and facts, and groups related episodes and their facts via hyperedges, unifying scattered content into coherent units. Leveraging this structure, we design a hybrid lexical-semantic index and a coarse-to-fine retrieval strategy, supporting accurate and efficient retrieval of high-order associations. Experiments on the LoCoMo benchmark show that HyperMem achieves state-of-the-art performance with 92.73% LLM-as-a-judge accuracy, demonstrating the effectiveness of HyperMem for long-term conversations.
2023
Divide, Conquer, and Combine: Mixture of Semantic-Independent Experts for Zero-Shot Dialogue State Tracking
Qingyue Wang | Liang Ding | Yanan Cao | Yibing Zhan | Zheng Lin | Shi Wang | Dacheng Tao | Li Guo
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Qingyue Wang | Liang Ding | Yanan Cao | Yibing Zhan | Zheng Lin | Shi Wang | Dacheng Tao | Li Guo
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zero-shot transfer learning for Dialogue State Tracking (DST) helps to handle a variety of task-oriented dialogue domains without the cost of collecting in-domain data. Existing works mainly study common data- or model-level augmentation methods to enhance the generalization but fail to effectively decouple semantics of samples, limiting the zero-shot performance of DST. In this paper, we present a simple and effective “divide, conquer and combine” solution, which explicitly disentangles the semantics of seen data, and leverages the performance and robustness with the mixture-of-experts mechanism. Specifically, we divide the seen data into semantically independent subsets and train corresponding experts, the newly unseen samples are mapped and inferred with mixture-of-experts with our designed ensemble inference. Extensive experiments on MultiWOZ2.1 upon T5-Adapter show our schema significantly and consistently improves the zero-shot performance, achieving the SOTA on settings without external knowledge, with only 10M trainable parameters.
2022
Slot Dependency Modeling for Zero-Shot Cross-Domain Dialogue State Tracking
Qingyue Wang | Yanan Cao | Piji Li | Yanhe Fu | Zheng Lin | Li Guo
Proceedings of the 29th International Conference on Computational Linguistics
Qingyue Wang | Yanan Cao | Piji Li | Yanhe Fu | Zheng Lin | Li Guo
Proceedings of the 29th International Conference on Computational Linguistics
2021
From What to Why: Improving Relation Extraction with Rationale Graph
Zhenyu Zhang | Bowen Yu | Xiaobo Shu | Xue Mengge | Tingwen Liu | Li Guo
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
Zhenyu Zhang | Bowen Yu | Xiaobo Shu | Xue Mengge | Tingwen Liu | Li Guo
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
2020
Document-level Relation Extraction with Dual-tier Heterogeneous Graph
Zhenyu Zhang | Bowen Yu | Xiaobo Shu | Tingwen Liu | Hengzhu Tang | Wang Yubin | Li Guo
Proceedings of the 28th International Conference on Computational Linguistics
Zhenyu Zhang | Bowen Yu | Xiaobo Shu | Tingwen Liu | Hengzhu Tang | Wang Yubin | Li Guo
Proceedings of the 28th International Conference on Computational Linguistics
Document-level relation extraction (RE) poses new challenges over its sentence-level counterpart since it requires an adequate comprehension of the whole document and the multi-hop reasoning ability across multiple sentences to reach the final result. In this paper, we propose a novel graph-based model with Dual-tier Heterogeneous Graph (DHG) for document-level RE. In particular, DHG is composed of a structure modeling layer followed by a relation reasoning layer. The major advantage is that it is capable of not only capturing both the sequential and structural information of documents but also mixing them together to benefit for multi-hop reasoning and final decision-making. Furthermore, we employ Graph Neural Networks (GNNs) based message propagation strategy to accumulate information on DHG. Experimental results demonstrate that the proposed method achieves state-of-the-art performance on two widely used datasets, and further analyses suggest that all the modules in our model are indispensable for document-level RE.
2018
Improving Knowledge Graph Embedding Using Simple Constraints
Boyang Ding | Quan Wang | Bin Wang | Li Guo
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Boyang Ding | Quan Wang | Bin Wang | Li Guo
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Embedding knowledge graphs (KGs) into continuous vector spaces is a focus of current research. Early works performed this task via simple models developed over KG triples. Recent attempts focused on either designing more complicated triple scoring models, or incorporating extra information beyond triples. This paper, by contrast, investigates the potential of using very simple constraints to improve KG embedding. We examine non-negativity constraints on entity representations and approximate entailment constraints on relation representations. The former help to learn compact and interpretable representations for entities. The latter further encode regularities of logical entailment between relations into their distributed representations. These constraints impose prior beliefs upon the structure of the embedding space, without negative impacts on efficiency or scalability. Evaluation on WordNet, Freebase, and DBpedia shows that our approach is simple yet surprisingly effective, significantly and consistently outperforming competitive baselines. The constraints imposed indeed improve model interpretability, leading to a substantially increased structuring of the embedding space. Code and data are available at https://github.com/iieir-km/ComplEx-NNE_AER.
2016
Jointly Embedding Knowledge Graphs and Logical Rules
Shu Guo | Quan Wang | Lihong Wang | Bin Wang | Li Guo
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
Shu Guo | Quan Wang | Lihong Wang | Bin Wang | Li Guo
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
2015
Context-Dependent Knowledge Graph Embedding
Yuanfei Luo | Quan Wang | Bin Wang | Li Guo
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
Yuanfei Luo | Quan Wang | Bin Wang | Li Guo
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
Semantically Smooth Knowledge Graph Embedding
Shu Guo | Quan Wang | Bin Wang | Lihong Wang | Li Guo
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Shu Guo | Quan Wang | Bin Wang | Lihong Wang | Li Guo
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
2014
Search
Fix author
Co-authors
- Yanan Cao 5
- Quan Wang 5
- Bin Wang 5
- Fang Fang 3
- Tingwen Liu 3
- Yubing Ren 3
- Shi Wang 3
- Shu Guo 2
- Zheng Lin 2
- Xiaobo Shu 2
- Qingyue Wang 2
- Lihong Wang 2
- Bowen Yu 2
- Zhenyu Zhang 2
- Shuo Bai 1
- Yafeng Deng 1
- Liang Ding 1
- Boyang Ding 1
- Yanhe Fu 1
- Yue Hu (胡月) 1
- Chuanrui Hu 1
- Xiaoxue Li 1
- Piji Li (李丕绩) 1
- Hao Li 1
- Yingjie Li 1
- Jiguang Liang 1
- Jing Liu (刘晶, 刘璟) 1
- Yuanfei Luo 1
- Xue Mengge 1
- Jiawei Sheng 1
- Hengzhu Tang 1
- Dacheng Tao 1
- Zhuoshang Wang 1
- Wang Yubin 1
- Juwei Yue 1
- Yibing Zhan 1
- Wenyuan Zhang 1
- Xiaofei Zhou 1
- Zuyi Zhou 1
- Xiaowei Zhu 1