2025
pdf
bib
abs
AdaEdit: Advancing Continuous Knowledge Editing For Large Language Models
Qi Li
|
Xiaowen Chu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Knowledge editing (KE) has emerged as a prominent alternative that enables efficient and precise information modification inside language models. However, a critical challenge arises in continuous language models editing — a significant performance decline both in knowledge update and retention when the number of edits increases. By dissecting the perturbation weight of language model in continuous KE, we uncover that disentangled and sparsified knowledge representation can significantly alleviate the performance decline. Building on these insights, we introduce AdaEdit, a novel knowledge editing method. Extensive empirical evaluations on multiple LLMs demonstrate that our proposed methods can enhance the performance of edited LLMs in large-size continuous editing regimes, outperforming existing ones without substantially compromising the general abilities of these models.
pdf
bib
abs
from Benign import Toxic: Jailbreaking the Language Model via Adversarial Metaphors
Yu Yan
|
Sheng Sun
|
Zenghao Duan
|
Teli Liu
|
Min Liu
|
Zhiyi Yin
|
LeiJingyu LeiJingyu
|
Qi Li
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Current studies have exposed the risk of Large Language Models (LLMs) generating harmful content by jailbreak attacks. However, they overlook that the direct generation of harmful content from scratch is more difficult than inducing LLM to calibrate benign content into harmful forms.In our study, we introduce a novel attack framework that exploits AdVersArial meTAphoR (AVATAR) to induce the LLM to calibrate malicious metaphors for jailbreaking.Specifically, to answer harmful queries, AVATAR adaptively identifies a set of benign but logically related metaphors as the initial seed.Then, driven by these metaphors, the target LLM is induced to reason and calibrate about the metaphorical content, thus jailbroken by either directly outputting harmful responses or calibrating residuals between metaphorical and professional harmful content.Experimental results demonstrate that AVATAR can effectively and transferably jailbreak LLMs and achieve a state-of-the-art attack success rate across multiple advanced LLMs.
pdf
bib
abs
Towards a More Generalized Approach in Open Relation Extraction
Qing Wang
|
Yuepei Li
|
Qiao Qiao
|
Kang Zhou
|
Qi Li
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Open Relation Extraction (OpenRE) seeks to identify and extract novel relational facts between named entities from unlabeled data without pre-defined relation schemas. Traditional OpenRE methods typically assume that the unlabeled data consists solely of novel relations or is pre-divided into known and novel instances. However, in real-world scenarios, novel relations are arbitrarily distributed. In this paper, we propose a generalized OpenRE setting that considers unlabeled data as a mixture of both known and novel instances. To address this, we propose MixORE, a two-phase framework that integrates relation classification and clustering to jointly learn known and novel relations. Experiments on three benchmark datasets demonstrate that MixORE consistently outperforms competitive baselines in known relation classification and novel relation clustering. Our findings contribute to the advancement of generalized OpenRE research and real-world applications.
pdf
bib
abs
Retrieval Augmented Instruction Tuning for Open NER with Large Language Models
Tingyu Xie
|
Jian Zhang
|
Yan Zhang
|
Yuanyuan Liang
|
Qi Li
|
Hongwei Wang
Proceedings of the 31st International Conference on Computational Linguistics
The strong capability of large language models (LLMs) has been applied to information extraction (IE) through either retrieval augmented prompting or instruction tuning (IT). However, the best way to incorporate information with LLMs for IE remains an open question. In this paper, we explore Retrieval Augmented Instruction Tuning (RA-IT) for IE, focusing on the task of open named entity recognition (NER). Specifically, for each training sample, we retrieve semantically similar examples from the training dataset as the context and prepend them to the input of the original instruction. To evaluate our RA-IT approach more thoroughly, we construct a Chinese IT dataset for open NER and evaluate RA-IT in both English and Chinese scenarios. Experimental results verify the effectiveness of RA-IT across various data sizes and in both English and Chinese scenarios. We also conduct thorough studies to explore the impacts of various retrieval strategies in the proposed RA-IT framework.
pdf
bib
abs
Investigating Context Faithfulness in Large Language Models: The Roles of Memory Strength and Evidence Style
Yuepei Li
|
Kang Zhou
|
Qiao Qiao
|
Bach Nguyen
|
Qing Wang
|
Qi Li
Findings of the Association for Computational Linguistics: ACL 2025
Retrieval-augmented generation (RAG) improves Large Language Models (LLMs) by incorporating external information into the response generation process. However, how context-faithful LLMs are and what factors influence LLMs’ context faithfulness remain largely unexplored. In this study, we investigate the impact of memory strength and evidence presentation on LLMs’ receptiveness to external evidence. We quantify the memory strength of LLMs by measuring the divergence in LLMs’ responses to different paraphrases of the same question, which is not considered by previous works. We also generate evidence in various styles to examine LLMs’ behavior. Our results show that for questions with high memory strength, LLMs are more likely to rely on internal memory. Furthermore, presenting paraphrased evidence significantly increases LLMs’ receptiveness compared to simple repetition or adding details. These findings provide key insights for improving retrieval-augmented generation and context-aware LLMs. Our code is available at https://github.com/liyp0095/ContextFaithful.
pdf
bib
abs
Beyond the Tip of Efficiency: Uncovering the Submerged Threats of Jailbreak Attacks in Small Language Models
Sibo Yi
|
Tianshuo Cong
|
Xinlei He
|
Qi Li
|
Jiaxing Song
Findings of the Association for Computational Linguistics: ACL 2025
Small language models (SLMs) have become increasingly prominent in the deployment on edge devices due to their high efficiency and low computational cost. While researchers continue to advance the capabilities of SLMs through innovative training strategies and model compression techniques, the security risks of SLMs have received considerably less attention compared to large language models (LLMs). To fill this gap, we provide a comprehensive empirical study to evaluate the security performance of 13 state-of-the-art SLMs under various jailbreak attacks. Our experiments demonstrate that most SLMs are quite susceptible to existing jailbreak attacks, while some of them are even vulnerable to direct harmful prompts. To address the safety concerns, we evaluate several representative defense methods and demonstrate their effectiveness in enhancing the security of SLMs. We further analyze the potential security degradation caused by different SLM techniques including architecture compression, quantization, knowledge distillation, and so on. We expect that our research can highlight the security challenges of SLMs and provide valuable insights to future work in developing more robust and secure SLMs.
pdf
bib
abs
Is Large Language Model Performance on Reasoning Tasks Impacted by Different Ways Questions Are Asked?
Seok Hwan Song
|
Mohna Chakraborty
|
Qi Li
|
Wallapak Tavanapong
Findings of the Association for Computational Linguistics: ACL 2025
Large Language Models (LLMs) have been evaluated using diverse question types, e.g., multiple-choice, true/false, and short/long answers. This study answers an unexplored question about the impact of different question types on LLM accuracy on reasoning tasks. We investigate the performance of five LLMs on three different types of questions using quantitative and deductive reasoning tasks. The performance metrics include accuracy in the reasoning steps and choosing the final answer. Key Findings: (1) Significant differences exist in LLM performance across different question types. (2) Reasoning accuracy does not necessarily correlate with the final selection accuracy. (3) The number of options and the choice of words, influence LLM performance.
2024
pdf
bib
abs
Structured Object Language Modeling (SO-LM): Native Structured Objects Generation Conforming to Complex Schemas with Self-Supervised Denoising
Amir Tavanaei
|
Kee Kiat Koo
|
Hayreddin Ceker
|
Shaobai Jiang
|
Qi Li
|
Julien Han
|
Karim Bouyarmane
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
In this paper, we study the problem of generating structured objects that conform to a complex schema, with intricate dependencies between the different components (facets) of the object. The facets of the object (attributes, fields, columns, properties) can be a mix of short, structured facts, or long natural-language descriptions. The object has to be self-consistent between the different facets in the redundant information it carries (relative consistency), while being grounded with respect to world knowledge (absolute consistency). We frame the problem as a Language Modeling problem (Structured Object Language Modeling) and train an LLM to perform the task natively, without requiring instructions or prompt-engineering. We propose a self-supervised denoising method to train the model from an existing dataset of such objects. The input query can be the existing object itself, in which case the system acts as a regenerator, completing, correcting, normalizing the input, or any unstructured blurb to be structured. We show that the self-supervised denoising training provides a strong baseline, and that additional supervised fine-tuning with small amount of human demonstrations leads to further improvement. Experimental results show that the proposed method matches or outperforms prompt-engineered general-purpose state-of-the-art LLMs (Claude 3, Mixtral-8x7B), while being order-of-magnitude more cost-efficient.
pdf
bib
abs
Can We Continually Edit Language Models? On the Knowledge Attenuation in Sequential Model Editing
Qi Li
|
Xiaowen Chu
Findings of the Association for Computational Linguistics: ACL 2024
Model editing has become a promising method for precisely and effectively updating knowledge in language models. In this paper, we investigate knowledge attenuation, in which the retention of updated knowledge within the language model decreases as the number of edits increases after sequential editing. Through empirical study, we discovered that existing editing methods generally suffer from knowledge attenuation. We attribute this phenomenon to two aspects: (1) redundant parameters interference and (2) update weight disentanglement. To this end, we propose the AdaPLE method. It not only mitigates the knowledge attenuation issue but also improves the performance on existing benchmarks. To the best of our knowledge, we are the first to investigate the cause and mitigation of knowledge attenuation in sequential LLM editing.
pdf
bib
abs
Self-Improving for Zero-Shot Named Entity Recognition with Large Language Models
Tingyu Xie
|
Qi Li
|
Yan Zhang
|
Zuozhu Liu
|
Hongwei Wang
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)
Exploring the application of powerful large language models (LLMs) on the named entity recognition (NER) task has drawn much attention recently. This work pushes the performance boundary of zero-shot NER with LLMs by proposing a training-free self-improving framework, which utilizes an unlabeled corpus to stimulate the self-learning ability of LLMs. First, we use the LLM to make predictions on the unlabeled corpus using self-consistency and obtain a self-annotated dataset. Second, we explore various strategies to select reliable annotations to form a reliable self-annotated dataset. Finally, for each test input, we retrieve demonstrations from the reliable self-annotated dataset and perform inference via in-context learning. Experiments on four benchmarks show substantial performance improvements achieved by our framework. Through comprehensive experimental analysis, we find that increasing the size of unlabeled corpus or iterations of self-improving does not guarantee further improvement, but the performance might be boosted via more advanced strategies for reliable annotation selection.
2023
pdf
bib
abs
Empirical Study of Zero-Shot NER with ChatGPT
Tingyu Xie
|
Qi Li
|
Jian Zhang
|
Yan Zhang
|
Zuozhu Liu
|
Hongwei Wang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) exhibited powerful capability in various natural language processing tasks. This work focuses on exploring LLM performance on zero-shot information extraction, with a focus on the ChatGPT and named entity recognition (NER) task. Inspired by the remarkable reasoning capability of LLM on symbolic and arithmetic reasoning, we adapt the prevalent reasoning methods to NER and propose reasoning strategies tailored for NER. First, we explore a decomposed question-answering paradigm by breaking down the NER task into simpler subproblems by labels. Second, we propose syntactic augmentation to stimulate the model’s intermediate thinking in two ways: syntactic prompting, which encourages the model to analyze the syntactic structure itself, and tool augmentation, which provides the model with the syntactic information generated by a parsing tool. Besides, we adapt self-consistency to NER by proposing a two-stage majority voting strategy, which first votes for the most consistent mentions, then the most consistent types. The proposed methods achieve remarkable improvements for zero-shot NER across seven benchmarks, including Chinese and English datasets, and on both domain-specific and general-domain scenarios. In addition, we present a comprehensive analysis of the error types with suggestions for optimization directions. We also verify the effectiveness of the proposed methods on the few-shot setting and other LLMs.
pdf
bib
abs
A Class-Rebalancing Self-Training Framework for Distantly-Supervised Named Entity Recognition
Qi Li
|
Tingyu Xie
|
Peng Peng
|
Hongwei Wang
|
Gaoang Wang
Findings of the Association for Computational Linguistics: ACL 2023
Distant supervision reduces the reliance on human annotation in the named entity recognition tasks. The class-level imbalanced distant annotation is a realistic and unexplored problem, and the popular method of self-training can not handle class-level imbalanced learning. More importantly, self-training is dominated by the high-performance class in selecting candidates, and deteriorates the low-performance class with the bias of generated pseudo label. To address the class-level imbalance performance, we propose a class-rebalancing self-training framework for improving the distantly-supervised named entity recognition. In candidate selection, a class-wise flexible threshold is designed to fully explore other classes besides the high-performance class. In label generation, injecting the distant label, a hybrid pseudo label is adopted to provide straight semantic information for the low-performance class. Experiments on five flat and two nested datasets show that our model achieves state-of-the-art results. We also conduct extensive research to analyze the effectiveness of the flexible threshold and the hybrid pseudo label.
2021
pdf
bib
abs
QA-Driven Zero-shot Slot Filling with Weak Supervision Pretraining
Xinya Du
|
Luheng He
|
Qi Li
|
Dian Yu
|
Panupong Pasupat
|
Yuan Zhang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Slot-filling is an essential component for building task-oriented dialog systems. In this work, we focus on the zero-shot slot-filling problem, where the model needs to predict slots and their values, given utterances from new domains without training on the target domain. Prior methods directly encode slot descriptions to generalize to unseen slot types. However, raw slot descriptions are often ambiguous and do not encode enough semantic information, limiting the models’ zero-shot capability. To address this problem, we introduce QA-driven slot filling (QASF), which extracts slot-filler spans from utterances with a span-based QA model. We use a linguistically motivated questioning strategy to turn descriptions into questions, allowing the model to generalize to unseen slot types. Moreover, our QASF model can benefit from weak supervision signals from QA pairs synthetically generated from unlabeled conversations. Our full system substantially outperforms baselines by over 5% on the SNIPS benchmark.
pdf
bib
abs
Few-shot Intent Classification and Slot Filling with Retrieved Examples
Dian Yu
|
Luheng He
|
Yuan Zhang
|
Xinya Du
|
Panupong Pasupat
|
Qi Li
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Few-shot learning arises in important practical scenarios, such as when a natural language understanding system needs to learn new semantic labels for an emerging, resource-scarce domain. In this paper, we explore retrieval-based methods for intent classification and slot filling tasks in few-shot settings. Retrieval-based methods make predictions based on labeled examples in the retrieval index that are similar to the input, and thus can adapt to new domains simply by changing the index without having to retrain the model. However, it is non-trivial to apply such methods on tasks with a complex label space like slot filling. To this end, we propose a span-level retrieval method that learns similar contextualized representations for spans with the same label via a novel batch-softmax objective. At inference time, we use the labels of the retrieved spans to construct the final structure with the highest aggregated score. Our method outperforms previous systems in various few-shot settings on the CLINC and SNIPS benchmarks.
2019
pdf
bib
abs
Improving Relation Extraction with Knowledge-attention
Pengfei Li
|
Kezhi Mao
|
Xuefeng Yang
|
Qi Li
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
While attention mechanisms have been proven to be effective in many NLP tasks, majority of them are data-driven. We propose a novel knowledge-attention encoder which incorporates prior knowledge from external lexical resources into deep neural networks for relation extraction task. Furthermore, we present three effective ways of integrating knowledge-attention with self-attention to maximize the utilization of both knowledge and data. The proposed relation extraction system is end-to-end and fully attention-based. Experiment results show that the proposed knowledge-attention mechanism has complementary strengths with self-attention, and our integrated models outperform existing CNN, RNN, and self-attention based models. State-of-the-art performance is achieved on TACRED, a complex and large-scale relation extraction dataset.
2016
pdf
bib
Discourse Parsing with Attention-based Hierarchical Neural Networks
Qi Li
|
Tianshi Li
|
Baobao Chang
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
pdf
bib
Recognizing Salient Entities in Shopping Queries
Zornitsa Kozareva
|
Qi Li
|
Ke Zhai
|
Weiwei Guo
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
2015
pdf
bib
Seed-Based Event Trigger Labeling: How far can event descriptions get us?
Ofer Bronstein
|
Ido Dagan
|
Qi Li
|
Heng Ji
|
Anette Frank
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
2014
pdf
bib
Constructing Information Networks Using One Single Model
Qi Li
|
Heng Ji
|
Yu Hong
|
Sujian Li
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
pdf
bib
abs
Comparison of the Impact of Word Segmentation on Name Tagging for Chinese and Japanese
Haibo Li
|
Masato Hagiwara
|
Qi Li
|
Heng Ji
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Word Segmentation is usually considered an essential step for many Chinese and Japanese Natural Language Processing tasks, such as name tagging. This paper presents several new observations and analysis on the impact of word segmentation on name tagging; (1). Due to the limitation of current state-of-the-art Chinese word segmentation performance, a character-based name tagger can outperform its word-based counterparts for Chinese but not for Japanese; (2). It is crucial to keep segmentation settings (e.g. definitions, specifications, methods) consistent between training and testing for name tagging; (3). As long as (2) is ensured, the performance of word segmentation does not have appreciable impact on Chinese and Japanese name tagging.
pdf
bib
Incremental Joint Extraction of Entity Mentions and Relations
Qi Li
|
Heng Ji
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
2013
pdf
bib
Joint Event Extraction via Structured Prediction with Global Features
Qi Li
|
Heng Ji
|
Liang Huang
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
pdf
bib
Name-aware Machine Translation
Haibo Li
|
Jing Zheng
|
Heng Ji
|
Qi Li
|
Wen Wang
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
2011
pdf
bib
Cross-lingual Slot Filling from Comparable Corpora
Matthew Snover
|
Xiang Li
|
Wen-Pin Lin
|
Zheng Chen
|
Suzanne Tamang
|
Mingmin Ge
|
Adam Lee
|
Qi Li
|
Hao Li
|
Sam Anzaroot
|
Heng Ji
Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web