Li Li


2025

pdf bib
AD-LLM: Benchmarking Large Language Models for Anomaly Detection
Tiankai Yang | Yi Nian | Li Li | Ruiyao Xu | Yuangang Li | Jiaqi Li | Zhuo Xiao | Xiyang Hu | Ryan A. Rossi | Kaize Ding | Xia Hu | Yue Zhao
Findings of the Association for Computational Linguistics: ACL 2025

Anomaly detection (AD) is an important machine learning task with many real-world uses, including fraud detection, medical diagnosis, and industrial monitoring. Within natural language processing (NLP), AD helps detect issues like spam, misinformation, and unusual user activity. Although large language models (LLMs) have had a strong impact on tasks such as text generation and summarization, their potential in AD has not been studied enough. This paper introduces AD-LLM, the first benchmark that evaluates how LLMs can help with NLP anomaly detection. We examine three key tasks: (i) zero-shot detection, using LLMs’ pre-trained knowledge to perform AD without tasks-specific training; (ii) data augmentation, generating synthetic data and category descriptions to improve AD models; and (iii) model selection, using LLMs to suggest unsupervised AD models. Through experiments with different datasets, we find that LLMs can work well in zero-shot AD, that carefully designed augmentation methods are useful, and that explaining model selection for specific datasets remains challenging. Based on these results, we outline six future research directions on LLMs for AD.

pdf bib
Tree of Agents: Improving Long-Context Capabilities of Large Language Models through Multi-Perspective Reasoning
Song Yu | Xiaofei Xu | Ke Deng | Li Li | Lin Tian
Findings of the Association for Computational Linguistics: EMNLP 2025

Large language models (LLMs) face persistent challenges when handling long-context tasks, most notably the “lost in the middle” issue, where information located in the middle of a long input tends to be underutilized. Some existing methods that reduce input have the risk of discarding key information, while others that extend context windows often lead to attention dispersion. To address these limitations, we propose Tree of Agents (TOA), a multi-agent reasoning framework that segments the input into chunks processed by independent agents. Each agent generates its local cognition, then agents dynamically exchange information for collaborative reasoning along tree-structured paths. TOA enables agents to probe different reasoning orders for multi-perspective understanding, effectively mitigating position bias and reducing hallucinations. To improve processing efficiency, we incorporate prefix-hash caching and adaptive pruning strategies, achieving significant performance improvements with comparable API overhead. Experiments show that TOA, powered by compact LLaMA3.1-8B, significantly outperforms multiple baselines and demonstrates comparable performance to the latest and much larger commercial models, such as Gemini1.5-pro, on various long-context tasks. Code is available at https://github.com/Aireduce952/Tree-of-Agents.

pdf bib
Treble Counterfactual VLMs: A Causal Approach to Hallucination
Li Li | Jiashu Qu | Linxin Song | Yuxiao Zhou | Yuehan Qin | Tiankai Yang | Yue Zhao
Findings of the Association for Computational Linguistics: EMNLP 2025

Vision-Language Models (VLMs) excel at tasks such as image captioning and visual question answering but frequently produce hallucinated outputs that deviate from the actual visual input or prompt. While prior work links hallucination to biases in data or representation, their causal origins remain unclear. We propose a causal framework to analyze and mitigate hallucination in VLMs. Our key hypothesis is that hallucinations arise from unintended direct influences of the vision or text modality that bypass the intended multi-modal fusion. To examine this, we construct a causal graph of the VLM and use counterfactual analysis to estimate the Natural Direct Effect (NDE) of each modality and their interaction. By systematically identifying and suppressing these direct effects, we encourage outputs that are more faithfully grounded in true cross-modal reasoning. Our approach consists of three steps: (1) designing structural causal graphs to distinguish correct fusion pathways from spurious modality shortcuts, (2) estimating modality-specific and cross-modal NDE using perturbed image representations, hallucinated text embeddings, and degraded visual inputs, and (3) implementing a test-time intervention module to dynamically adjust the model’s dependence on each modality. Experimental results demonstrate that our method significantly reduces hallucination while preserving task performance, providing a robust and interpretable framework for improving VLM reliability.

pdf bib
LoRATK: LoRA Once, Backdoor Everywhere in the Share-and-Play Ecosystem
Hongyi Liu | Shaochen Zhong | Xintong Sun | Minghao Tian | Mohsen Hariri | Zirui Liu | Ruixiang Tang | Zhimeng Jiang | Jiayi Yuan | Yu-Neng Chuang | Li Li | Soo-Hyun Choi | Rui Chen | Vipin Chaudhary | Xia Hu
Findings of the Association for Computational Linguistics: EMNLP 2025

Backdoor attacks are powerful and effective, but distributing LLMs without a proven track record like ‘meta-llama‘ or ‘qwen‘ rarely gains community traction. We identify LoRA sharing as a unique scenario where users are more willing to try unendorsed assets, since such shared LoRAs allow them to enjoy personalized LLMs with negligible investment. However, this convenient share-and-play ecosystem also introduces a new attack surface, where attackers can distribute malicious LoRAs to an undefended community. Despite the high-risk potential, no prior art has comprehensively explored LoRA’s attack surface under the downstream-enhancing share-and-play context. In this paper, we investigate how backdoors can be injected into task-enhancing LoRAs and examine the mechanisms of such infections. We find that with a simple, efficient, yet specific recipe, **a backdoor LoRA can be trained once and then seamlessly merged (in a training-free fashion) with multiple task-enhancing LoRAs, retaining both its malicious backdoor and benign downstream capabilities.** This allows attackers to scale the distribution of compromised LoRAs with minimal effort by leveraging the rich pool of existing shared LoRA assets. We note that such merged LoRAs are particularly *infectious* — because their malicious intent is cleverly concealed behind improved downstream capabilities, creating a strong incentive for voluntary download — and *dangerous* — because under local deployment, no safety measures exist to intervene when things go wrong. Our work is among the first to study this new threat model of training-free distribution of downstream-capable-yet-backdoor-injected LoRAs, highlighting the urgent need for heightened security awareness in the LoRA ecosystem. **Warning: This paper contains offensive content and involves a real-life tragedy.**

2024

pdf bib
Evaluating Step-by-Step Reasoning through Symbolic Verification
YiFan Zhang | Hanlin Zhang | Li Li | Eric Xing
Findings of the Association for Computational Linguistics: NAACL 2024

Pre-trained language models (LMs) have shown remarkable reasoning performance using explanations or chain-of-thoughts (CoT)) for in-context learning. On the other hand, these reasoning tasks are usually presumed to be more approachable for symbolic programming. To understand the mechanism of reasoning of LMs, we curate synthetic datasets containing equivalent (natural, symbolic) data pairs, where symbolic examples contain first-order logic rules and predicates from non-parametric knowledge bases (KBs), supporting automated verification of intermediate reasoning results. Then we revisit neuro-symbolic approaches and propose to learn from demonstrations containing logic rules and corresponding examples to iteratively reason over KBs, recovering Prolog’s backward chaining algorithm and supporting automated verification of LMs’ outputs. Comprehensive experiments are included to systematically compare LMLP with CoT in deductive reasoning settings, showing that LMLP enjoys more than 25% higher accuracy than CoT on length generalization benchmarks even with smaller model sizes.

2022

pdf bib
Modeling Hierarchical Syntax Structure with Triplet Position for Source Code Summarization
Juncai Guo | Jin Liu | Yao Wan | Li Li | Pingyi Zhou
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Automatic code summarization, which aims to describe the source code in natural language, has become an essential task in software maintenance. Our fellow researchers have attempted to achieve such a purpose through various machine learning-based approaches. One key challenge keeping these approaches from being practical lies in the lacking of retaining the semantic structure of source code, which has unfortunately been overlooked by the state-of-the-art. Existing approaches resort to representing the syntax structure of code by modeling the Abstract Syntax Trees (ASTs). However, the hierarchical structures of ASTs have not been well explored. In this paper, we propose CODESCRIBE to model the hierarchical syntax structure of code by introducing a novel triplet position for code summarization. Specifically, CODESCRIBE leverages the graph neural network and Transformer to preserve the structural and sequential information of code, respectively. In addition, we propose a pointer-generator network that pays attention to both the structure and sequential tokens of code for a better summary generation. Experiments on two real-world datasets in Java and Python demonstrate the effectiveness of our proposed approach when compared with several state-of-the-art baselines.

pdf bib
CODE-MVP: Learning to Represent Source Code from Multiple Views with Contrastive Pre-Training
Xin Wang | Yasheng Wang | Yao Wan | Jiawei Wang | Pingyi Zhou | Li Li | Hao Wu | Jin Liu
Findings of the Association for Computational Linguistics: NAACL 2022

Recent years have witnessed increasing interest in code representation learning, which aims to represent the semantics of source code into distributed vectors. Currently, various works have been proposed to represent the complex semantics of source code from different views, including plain text, Abstract Syntax Tree (AST), and several kinds of code graphs (e.g., Control/Data Flow Graph). However, most of them only consider a single view of source code independently, ignoring the correspondences among different views. In this paper, we propose to integrate different views with the natural-language description of source code into a unified framework with Multi-View contrastive Pre-training, and name our model as CODE-MVP. Specifically, we first extract multiple code views using compiler tools, and learn the complementary information among them under a contrastive learning framework. Inspired by the type checking in compilation, we also design a fine-grained type inference objective in the pre-training. Experiments on three downstream tasks over five datasets demonstrate the superiority of CODE-MVP when compared with several state-of-the-art baselines. For example, we achieve 2.4/2.3/1.1 gain in terms of MRR/MAP/Accuracy metrics on natural language code retrieval, code similarity, and code defect detection tasks, respectively.

2021

pdf bib
Fast and Accurate Neural Machine Translation with Translation Memory
Qiuxiang He | Guoping Huang | Qu Cui | Li Li | Lemao Liu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

It is generally believed that a translation memory (TM) should be beneficial for machine translation tasks. Unfortunately, existing wisdom demonstrates the superiority of TM-based neural machine translation (NMT) only on the TM-specialized translation tasks rather than general tasks, with a non-negligible computational overhead. In this paper, we propose a fast and accurate approach to TM-based NMT within the Transformer framework: the model architecture is simple and employs a single bilingual sentence as its TM, leading to efficient training and inference; and its parameters are effectively optimized through a novel training criterion. Extensive experiments on six TM-specialized tasks show that the proposed approach substantially surpasses several strong baselines that use multiple TMs, in terms of BLEU and running time. In particular, the proposed approach also advances the strong baselines on two general tasks (WMT news Zh->En and En->De).

2015

pdf bib
Multi-label Text Categorization with Joint Learning Predictions-as-Features Method
Li Li | Houfeng Wang | Xu Sun | Baobao Chang | Shi Zhao | Lei Sha
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib
Predicting Chinese Abbreviations with Minimum Semantic Unit and Global Constraints
Longkai Zhang | Li Li | Houfeng Wang | Xu Sun
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Muli-label Text Categorization with Hidden Components
Li Li | Longkai Zhang | Houfeng Wang
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
Improving Chinese Word Segmentation on Micro-blog Using Rich Punctuations
Longkai Zhang | Li Li | Zhengyan He | Houfeng Wang | Ni Sun
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2010

pdf bib
Person Name Disambiguation based on Topic Model
Jiashen Sun | Tianmin Wang | Li Li | Xing Wu
CIPS-SIGHAN Joint Conference on Chinese Language Processing

2002

pdf bib
An Indexing Method Based on Sentences
Li Li | Chunfa Yuan | K.F. Wong | Wenjie Li
COLING-02: The First SIGHAN Workshop on Chinese Language Processing

1998

pdf bib
A Test Environment for Natural Language Understanding Systems
Li Li | Deborah A. Dahl | Lewis M. Norton | Marcia C. Linebarger | Dongdong Chen
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

pdf bib
Integration of Large-Scale Linguistic Resources in a Natural Language Understanding System
Lewis M. Norton | Deborah A. Dahl | Li Li | Katharine P. Beals
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

pdf bib
A Test Environment for Natural Language Understanding Systems
Li Li | Deborah A. Dahl | Lewis M. Norton | Marcia C. Linebarger | Dongdong Chen
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

pdf bib
Integration of Large-Scale Linguistic Resources in a Natural Language Understanding System
Lewis M. Norton | Deborah A. Dahl | Li Li | Katharine P. Beals
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

1997

pdf bib
NL Assistant: A Toolkit for Developing Natural Language: Applications
Deborah A. Dahl | Lewis M. Norton | Ahmed Bouzid | Li Li
Fifth Conference on Applied Natural Language Processing: Descriptions of System Demonstrations and Videos