2025
pdf
bib
abs
Personalized Text Generation with Contrastive Activation Steering
Jinghao Zhang
|
Yuting Liu
|
Wenjie Wang
|
Qiang Liu
|
Shu Wu
|
Liang Wang
|
Tat-Seng Chua
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Personalized text generation aims to infer users’ writing style preferences from their historical texts and generate outputs that faithfully reflect these stylistic characteristics. Existing solutions primarily adopt two paradigms: retrieval-augmented generation (RAG) and parameter-efficient fine-tuning (PEFT). While these approaches have advanced the field, they suffer from two critical limitations: (1) the entanglement of content semantics and stylistic patterns in historical texts impedes accurate modeling of user-specific writing preferences; and (2) scalability challenges arising from both RAG’s inference latency by retrieval operations and PEFT’s parameter storage requirements for per user model. To overcome these limitations, we propose StyleVector, a training-free framework that disentangles and represents personalized writing style as a vector in LLM’s activation space, enabling style-steered generation during inference without requiring costly retrieval or parameter storage. Comprehensive experiments demonstrate that our framework achieves a significant 8% relative improvement in personalized generation while reducing storage requirements by 1700 × over PEFT method.
pdf
bib
abs
Length Controlled Generation for Black-box LLMs
Yuxuan Gu
|
Wenjie Wang
|
Xiaocheng Feng
|
Weihong Zhong
|
Kun Zhu
|
Lei Huang
|
Ting Liu
|
Bing Qin
|
Tat-Seng Chua
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) have demonstrated impressive instruction following capabilities, while still struggling to accurately manage the length of the generated text, which is a fundamental requirement in many real-world applications. Existing length control methods involve fine-tuning the parameters of LLMs, which is inefficient and suboptimal for practical use. In this paper, we propose a novel iterative sampling framework for text length control, integrating the Metropolis-Hastings algorithm with an importance sampling acceleration strategy. This framework efficiently and reliably regulates LLMs to generate length-constrained text without modifying the underlying parameters, thereby preserving the original capabilities of LLMs. Experimental results demonstrate that our framework achieves almost 100% success rates of length control on Llama3.1 for tasks such as length-controlled abstractive summarization and length-constrained instruction following, with minimal additional computational overhead. This also highlights the significant potential of our method for precise length control across a broader range of applications, without compromising the versatility of LLMs.
pdf
bib
abs
Tunable LLM-based Proactive Recommendation Agent
Mingze Wang
|
Chongming Gao
|
Wenjie Wang
|
Yangyang Li
|
Fuli Feng
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recommender systems are indispensable on various digital platforms. However, traditional methods often reinforce existing user interests, which leads to echo chambers and limits diversity. Proactive Recommendation Systems (PRS) aim to address this issue by cultivating users’ latent interests through multi-step recommendations. Despite advancements, challenges persist particularly in optimizing long-term rewards and adapting to real-time user feedback. In this study, we propose an LLM-based Actor-Critic Agent framework to enhance PRS. This framework utilizes the LLM-based agent to adjust recommendations in real time based on feedback and employs agent-tuning methods to optimize long-term rewards using three proposed reward functions. Extensive experiments validate the significant superiority of this framework over existing methods by optimizing long-term rewards and dynamically evolving with user feedback.
pdf
bib
abs
Personalized Generation In Large Model Era: A Survey
Yiyan Xu
|
Jinghao Zhang
|
Alireza Salemi
|
Xinting Hu
|
Wenjie Wang
|
Fuli Feng
|
Hamed Zamani
|
Xiangnan He
|
Tat-Seng Chua
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
In the era of large models, content generation is gradually shifting to Personalized Generation (PGen), tailoring content to individual preferences and needs. This paper presents the first comprehensive survey on PGen, investigating existing research in this rapidly growing field. We conceptualize PGen from a unified perspective, systematically formalizing its key components, core objectives, and abstract workflows. Based on this unified perspective, we propose a multi-level taxonomy, offering an in-depth review of technical advancements, commonly used datasets, and evaluation metrics across multiple modalities, personalized contexts, and tasks. Moreover, we envision the potential applications of PGen and highlight open challenges and promising directions for future exploration. By bridging PGen research across multiple modalities, this survey serves as a valuable resource for fostering knowledge sharing and interdisciplinary collaboration, ultimately contributing to a more personalized digital landscape.
pdf
bib
abs
Consistency-Aware Online Multi-Objective Alignment for Related Search Query Generation
Shuxian Bi
|
Chongming Gao
|
Wenjie Wang
|
Yueqi Mou
|
Chenxu Wang
|
Tang Biao
|
Peng Yan
|
Fuli Feng
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
Modern digital platforms rely on related search query recommendations to enhance engagement, yet existing methods fail to reconcile click-through rate (CTR) optimization with topic expansion. We propose **CMAQ**, a **C**onsistent **M**ulti-Objective **A**ligned **Q**uery generation framework that harmonizes these goals through three components: (1) reward modeling to quantify objectives, (2) style alignment for format compliance, and (3) consistency-aware optimization to coordinate joint improvements. CMAQ employs adaptive 𝛽-scaled DPO with geometric mean rewards, balancing CTR and expansion while mitigating objective conflicts. Extensive offline and online evaluations in a large-scale industrial setting demonstrate CMAQ’s superiority, achieving significant CTR gains (+2.3%) and higher human-rated query quality compared to state-of-the-art methods. Our approach enables high-quality query generation while sustaining user engagement and platform ecosystem health.
pdf
bib
abs
Counterfactual Debating with Preset Stances for Hallucination Elimination of LLMs
Yi Fang
|
Moxin Li
|
Wenjie Wang
|
Lin Hui
|
Fuli Feng
Proceedings of the 31st International Conference on Computational Linguistics
Large Language Models (LLMs) excel in various natural language processing tasks but struggle with hallucination issues. Existing solutions have considered utilizing LLMs’ inherent reasoning abilities to alleviate hallucination, such as self-correction and diverse sampling methods. However, these methods often overtrust LLMs’ initial answers due to inherent biases. The key to alleviating this issue lies in overriding LLMs’ inherent biases for answer inspection. To this end, we propose a CounterFactual Multi-Agent Debate (CFMAD) framework. CFMAD presets the stances of LLMs to override their inherent biases by compelling LLMs to generate justifications for a predetermined answer’s correctness. The LLMs with different predetermined stances are engaged with a skeptical critic for counterfactual debate on the rationality of generated justifications. Finally, the debate process is evaluated by a third-party judge to determine the final answer. Extensive experiments on four datasets of three tasks demonstrate the superiority of CFMAD over existing methods.
pdf
bib
abs
A Federated Framework for LLM-based Recommendation
Jujia Zhao
|
Wenjie Wang
|
Chen Xu
|
See-Kiong Ng
|
Tat-Seng Chua
Findings of the Association for Computational Linguistics: NAACL 2025
Large Language Models (LLMs) have showcased their potential in building generative recommendation systems through fine-tuning user behavior data. However, utilizing the user behavior data may pose significant privacy risks like in the traditional recommender models, potentially leading to ethical dilemmas and violations of data protection regulations. To address the privacy concerns, Federated Learning for Recommendation (Fed4Rec) has been identified as a promising solution. However, directly applying Fed4Rec in the LLM context introduces two challenges: 1) exacerbated client performance imbalance, which ultimately impacts the system’s long-term effectiveness, and 2) substantial client resource costs, posing a high demand for clients’ both computational and storage capability to locally train and infer LLMs.To tackle these challenges, we propose a federated framework for LLM-based recommendation (shorted as FELLRec). Generally, FELLRec designs two key strategies. 1) Dynamic balance strategy, which designs dynamic parameter aggregation and learning speed for different clients during training, aiming to ensure relatively balanced performance across clients. 2) Flexible storage strategy, which selectively retains certain sensitive LLM layers on the client side, while offloading other layers to the server, aiming to preserve privacy while saving resources. Specifically, FELLRec flexibly maintains those input and output layers on the client side to ensure the protection of all sensitive information. Experiment results show that FELLRec can achieve a more balanced client performance and improved overall performance in a computational and storage-efficient way while safeguarding user privacy well.
pdf
bib
abs
HellaSwag-Pro: A Large-Scale Bilingual Benchmark for Evaluating the Robustness of LLMs in Commonsense Reasoning
Xiaoyuan Li
|
Moxin Li
|
Rui Men
|
Yichang Zhang
|
Keqin Bao
|
Wenjie Wang
|
Fuli Feng
|
Dayiheng Liu
|
Junyang Lin
Findings of the Association for Computational Linguistics: ACL 2025
Large language models (LLMs) have shown remarkable capabilities in commonsense reasoning; however, some variations in questions can trigger incorrect responses. Do these models truly understand commonsense knowledge, or just memorize expression patterns? To investigate this question, we present the first extensive robustness evaluation of LLMs in commonsense reasoning. We introduce HellaSwag-Pro, a large-scale bilingual benchmark consisting of 11,200 cases, by designing and compiling seven types of question variants. To construct this benchmark, we propose a two-stage method to develop Chinese HellaSwag, a finely annotated dataset comprising 12,000 instances across 56 categories. We conduct extensive experiments on 41 representative LLMs, revealing that these LLMs are far from robust in commonsense reasoning. Furthermore, this robustness varies depending on the language in which the LLM is tested. This work establishes a high-quality evaluation benchmark, with extensive experiments offering valuable insights to the community in commonsense reasoning for LLMs.
pdf
bib
abs
Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment
Moxin Li
|
Yuantao Zhang
|
Wenjie Wang
|
Wentao Shi
|
Zhuo Liu
|
Fuli Feng
|
Tat-Seng Chua
Findings of the Association for Computational Linguistics: ACL 2025
Multi-Objective Alignment (MOA) aims to align LLMs’ responses with multiple human preference objectives, with Direct Preference Optimization (DPO) emerging as a prominent approach. However, we find that DPO-based MOA approaches suffer from widespread preference conflicts in the data, where different objectives favor different responses. This results in conflicting optimization directions, hindering the optimization on the Pareto Front. To address this, we propose to construct Pareto-optimal responses to resolve preference conflicts. To efficiently obtain and utilize such responses, we propose a self-improving DPO framework that enables LLMs to self-generate and select Pareto-optimal responses for self-supervised preference alignment. Extensive experiments on two datasets demonstrate the superior Pareto Front achieved by our framework compared to various baselines
pdf
bib
abs
DELMAN: Dynamic Defense Against Large Language Model Jailbreaking with Model Editing
Yi Wang
|
Fenghua Weng
|
Sibei Yang
|
Zhan Qin
|
Minlie Huang
|
Wenjie Wang
Findings of the Association for Computational Linguistics: ACL 2025
Large Language Models (LLMs) are widely applied in decision making, but their deployment is threatened by jailbreak attacks, where adversarial users manipulate model behavior to bypass safety measures. Existing defense mechanisms, such as safety fine-tuning and model editing, either require extensive parameter modifications or lack precision, leading to performance degradation on general tasks, which is unsuitable to post-deployment safety alignment. To address these challenges, we propose DELMAN (**D**ynamic **E**diting for **L**L**M**s J**A**ilbreak Defe**N**se), a novel approach leveraging direct model editing for precise, dynamic protection against jailbreak attacks. DELMAN directly updates a minimal set of relevant parameters to neutralize harmful behaviors while preserving the model’s utility. To avoid triggering a safe response in benign context, we incorporate KL-divergence regularization to ensure the updated model remains consistent with the original model when processing benign queries. Experimental results demonstrate that DELMAN outperforms baseline methods in mitigating jailbreak attacks while preserving the model’s utility, and adapts seamlessly to new attack instances, providing a practical and efficient solution for post-deployment model protection.
pdf
bib
abs
Customizing In-context Learning for Dynamic Interest Adaption in LLM-based Recommendation
Keqin Bao
|
Ming Yan
|
Yang Zhang
|
Jizhi Zhang
|
Wenjie Wang
|
Fuli Feng
|
Xiangnan He
Findings of the Association for Computational Linguistics: ACL 2025
Frequently updating Large Language Model (LLM)-based recommender systems to adapt to dynamic user interests—as done for traditional ones—is impractical due to high training costs, even with acceleration methods. This work explores the possibility of adapting the model to dynamic user interests without any model-level updates via In-context Learning (ICL), which enables adaptation through few-shot examples within input prompts. While using recent user interactions as ICL demonstrations offers a potential solution for dynamic interest adaptation, existing LLM-based recommenders face critical limitations: recommendation-specific tuning often diminishes the model’s in-context learning ability, and the original LLM’s ICL lacks task-specific optimization for recommendations. To bridge this gap, we introduce RecICL, a framework that establishes recommendation-oriented in-context learning by structuring recent user interactions and current inputs into ICL formats. RecICL achieves dual objectives: (1) preserving fundamental ICL capabilities during recommendation adaptation and (2) dynamically capturing user preference evolution through the most recent interactions. Extensive experiments across multiple benchmarks demonstrate RecICL’s superior performance, achieving better results without model updates. Our implementation is publicly available at
https://anonymous.4open.science/r/RecICL-8003.
pdf
bib
abs
Measuring What Makes You Unique: Difference-Aware User Modeling for Enhancing LLM Personalization
Yilun Qiu
|
Xiaoyan Zhao
|
Yang Zhang
|
Yimeng Bai
|
Wenjie Wang
|
Hong Cheng
|
Fuli Feng
|
Tat-Seng Chua
Findings of the Association for Computational Linguistics: ACL 2025
Personalizing Large Language Models (LLMs) has become a critical step in facilitating their widespread application to enhance individual life experiences. In pursuit of personalization, distilling key preference information from an individual’s historical data as instructional preference context to customize LLM generation has emerged as a promising direction. However, these methods face a fundamental limitation by overlooking the inter-user comparative analysis, which is essential for identifying the inter-user differences that truly shape preferences. To address this limitation, we propose Difference-aware Personalization Learning (DPL), a novel approach that emphasizes extracting inter-user differences to enhance LLM personalization. DPL strategically selects representative users for comparison and establishes a structured standard to extract meaningful, task-relevant differences for customizing LLM generation. Extensive experiments on real-world datasets demonstrate that DPL significantly enhances LLM personalization. We release our code at
https://github.com/SnowCharmQ/DPL.
pdf
bib
abs
Don’t Say No: Jailbreaking LLM by Suppressing Refusal
Yukai Zhou
|
Jian Lou
|
Zhijie Huang
|
Zhan Qin
|
Sibei Yang
|
Wenjie Wang
Findings of the Association for Computational Linguistics: ACL 2025
Ensuring the safety alignment of Large Language Models (LLMs) is critical for generating responses consistent with human values. However, LLMs remain vulnerable to jailbreaking attacks, where carefully crafted prompts manipulate them into producing toxic content. One category of such attacks reformulates the task as an optimization problem, aiming to elicit affirmative responses from the LLM. However, these methods heavily rely on predefined objectionable behaviors, limiting their effectiveness and adaptability to diverse harmful queries. In this study, we first identify why the vanilla target loss is suboptimal and then propose enhancements to the loss objective. We introduce DSN (Don’t Say No) attack, which combines a cosine decay schedule method with refusal suppression to achieve higher success rates. Extensive experiments demonstrate that DSN outperforms baseline attacks and achieves state-of-the-art attack success rates (ASR). DSN also shows strong universality and transferability to unseen datasets and black-box models.
2024
pdf
bib
abs
Text-like Encoding of Collaborative Information in Large Language Models for Recommendation
Yang Zhang
|
Keqin Bao
|
Ming Yan
|
Wenjie Wang
|
Fuli Feng
|
Xiangnan He
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
When adapting Large Language Models for Recommendation (LLMRec), it is crucial to integrate collaborative information. Existing methods achieve this by learning collaborative embeddings in LLMs’ latent space from scratch or by mapping from external models. However, they fail to represent the information in a text-like format, which may not align optimally with LLMs. To bridge this gap, we introduce BinLLM, a novel LLMRec method that seamlessly integrates collaborative information through text-like encoding. BinLLM converts collaborative embeddings from external models into binary sequences — a specific text format that LLMs can understand and operate on directly, facilitating the direct usage of collaborative information in text-like format by LLMs. Additionally, BinLLM provides options to compress the binary sequence using dot-decimal notation to avoid excessively long lengths. Extensive experiments validate that BinLLM introduces collaborative information in a manner better aligned with LLMs, resulting in enhanced performance. We release our code at https://github.com/zyang1580/BinLLM.
pdf
bib
abs
Generative Cross-Modal Retrieval: Memorizing Images in Multimodal Language Models for Retrieval and Beyond
Yongqi Li
|
Wenjie Wang
|
Leigang Qu
|
Liqiang Nie
|
Wenjie Li
|
Tat-Seng Chua
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The recent advancements in generative language models have demonstrated their ability to memorize knowledge from documents and recall knowledge to respond to user queries effectively. Building upon this capability, we propose to enable multimodal large language models (MLLMs) to memorize and recall images within their parameters. Given a user query for visual content, the MLLM is anticipated to “recall” the relevant image from its parameters as the response. Achieving this target presents notable challenges, including inbuilt visual memory and visual recall schemes within MLLMs. To address these challenges, we introduce a generative cross-modal retrieval framework, which assigns unique identifier strings to represent images and involves two training steps: learning to memorize and learning to retrieve. The first step focuses on training the MLLM to memorize the association between images and their respective identifiers. The latter step teaches the MLLM to generate the corresponding identifier of the target image, given the textual query input. By memorizing images in MLLMs, we introduce a new paradigm to cross-modal retrieval, distinct from previous discriminative approaches. The experiments demonstrate that the generative paradigm performs effectively and efficiently even with large-scale image candidate sets.
pdf
bib
abs
Distillation Enhanced Generative Retrieval
Yongqi Li
|
Zhen Zhang
|
Wenjie Wang
|
Liqiang Nie
|
Wenjie Li
|
Tat-Seng Chua
Findings of the Association for Computational Linguistics: ACL 2024
Generative retrieval is a promising new paradigm in text retrieval that generates identifier strings of relevant passages as the retrieval target. This paradigm leverages powerful generative language models, distinct from traditional sparse or dense retrieval methods. In this work, we identify a viable direction to further enhance generative retrieval via distillation and propose a feasible framework, named DGR. DGR utilizes sophisticated ranking models, such as the cross-encoder, in a teacher role to supply a passage rank list, which captures the varying relevance degrees of passages instead of binary hard labels; subsequently, DGR employs a specially designed distilled RankNet loss to optimize the generative retrieval model, considering the passage rank order provided by the teacher model as labels. This framework only requires an additional distillation step to enhance current generative retrieval systems and does not add any burden to the inference stage. We conduct experiments on four public datasets, and the results indicate that DGR achieves state-of-the-art performance among the generative retrieval methods. Additionally, DGR demonstrates exceptional robustness and generalizability with various teacher models and distillation losses.
pdf
bib
abs
Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction
Xiaoyuan Li
|
Wenjie Wang
|
Moxin Li
|
Junrong Guo
|
Yang Zhang
|
Fuli Feng
Findings of the Association for Computational Linguistics: ACL 2024
The rapid advancement of Large Language Models (LLMs) in the realm of mathematical reasoning necessitates comprehensive evaluations to gauge progress and inspire future directions. Existing assessments predominantly focus on problem-solving from the examinee perspective, overlooking a dual perspective of examiner regarding error identification and correction.From the examiner perspective, we define four evaluation tasks for error identification and correction along with a new dataset with annotated error types and steps. We also design diverse prompts to thoroughly evaluate eleven representative LLMs. Our principal findings indicate that GPT-4 outperforms all models, while open-source model LLaMA-2-7B demonstrates comparable abilities to closed-source models GPT-3.5 and Gemini Pro.Notably, calculation error proves the most challenging error type. Moreover, prompting LLMs with the error types can improve the average correction accuracy by 47.9%. These results reveal potential directions for developing the mathematical reasoning abilities of LLMs.Our code and dataset is available on https://github.com/LittleCirc1e/EIC.
pdf
bib
abs
A Study of Implicit Ranking Unfairness in Large Language Models
Chen Xu
|
Wenjie Wang
|
Yuxin Li
|
Liang Pang
|
Jun Xu
|
Tat-Seng Chua
Findings of the Association for Computational Linguistics: EMNLP 2024
Recently, Large Language Models (LLMs) have demonstrated a superior ability to serve as ranking models. However, concerns have arisen as LLMs will exhibit discriminatory ranking behaviors based on users’ sensitive attributes (gender). Worse still, in this paper, we identify a subtler form of discrimination in LLMs, termed implicit ranking unfairness, where LLMs exhibit discriminatory ranking patterns based solely on non-sensitive user profiles, such as user names. Such implicit unfairness is more widespread but less noticeable, threatening the ethical foundation. To comprehensively explore such unfairness, our analysis will focus on three research aspects: (1) We propose an evaluation method to investigate the severity of implicit ranking unfairness. (2) We uncover the reasons for causing such unfairness. (3) To mitigate such unfairness effectively, we utilize a pair-wise regression method to conduct fair-aware data augmentation for LLM fine-tuning. The experiment demonstrates that our method outperforms existing approaches in ranking fairness, achieving this with only a small reduction in accuracy. Lastly, we emphasize the need for the community to identify and mitigate the implicit unfairness, aiming to avert the potential deterioration in the reinforced human-LLMs ecosystem deterioration.
pdf
bib
abs
Think Twice Before Trusting: Self-Detection for Large Language Models through Comprehensive Answer Reflection
Moxin Li
|
Wenjie Wang
|
Fuli Feng
|
Fengbin Zhu
|
Qifan Wang
|
Tat-Seng Chua
Findings of the Association for Computational Linguistics: EMNLP 2024
Self-detection for Large Language Models (LLMs) seeks to evaluate the trustworthiness of the LLM’s output by leveraging its own capabilities, thereby alleviating the issue of output hallucination. However, existing self-detection approaches only retrospectively evaluate answers generated by LLM, typically leading to the over-trust in incorrectly generated answers. To tackle this limitation, we propose a novel self-detection paradigm that considers the comprehensive answer space beyond LLM-generated answers. It thoroughly compares the trustworthiness of multiple candidate answers to mitigate the over-trust in LLM-generated incorrect answers. Building upon this paradigm, we introduce a two-step framework, which firstly instructs LLM to reflect and provide justifications for each candidate answer, and then aggregates the justifications for comprehensive target answer evaluation. This framework can be seamlessly integrated with existing approaches for superior self-detection. Extensive experiments on six datasets spanning three tasks demonstrate the effectiveness of the proposed framework.
pdf
bib
Cross-modality Information Check for Detecting Jailbreaking in Multimodal Large Language Models
Yue Xu
|
Xiuyuan Qi
|
Zhan Qin
|
Wenjie Wang
Findings of the Association for Computational Linguistics: EMNLP 2024
pdf
bib
abs
LinkPrompt: Natural and Universal Adversarial Attacks on Prompt-based Language Models
Yue Xu
|
Wenjie Wang
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Prompt-based learning is a new language model training paradigm that adapts the Pre-trained Language Models (PLMs) to downstream tasks, which revitalizes the performance benchmarks across various natural language processing (NLP) tasks. Instead of using a fixed prompt template to fine-tune the model, some research demonstrates the effectiveness of searching for the prompt via optimization. Such prompt optimization process of prompt-based learning on PLMs also gives insight into generating adversarial prompts to mislead the model, raising concerns about the adversarial vulnerability of this paradigm. Recent studies have shown that universal adversarial triggers (UATs) can be generated to alter not only the predictions of the target PLMs but also the prediction of corresponding Prompt-based Fine-tuning Models (PFMs) under the prompt-based learning paradigm. However, UATs found in previous works are often unreadable tokens or characters and can be easily distinguished from natural texts with adaptive defenses. In this work, we consider the naturalness of the UATs and develop LinkPrompt, an adversarial attack algorithm to generate UATs by a gradient-based beam search algorithm that not only effectively attacks the target PLMs and PFMs but also maintains the naturalness among the trigger tokens. Extensive results demonstrate the effectiveness of LinkPrompt, as well as the transferability of UATs generated by LinkPrompt to open-sourced Large Language Model (LLM) Llama2 and API-accessed LLM GPT-3.5-turbo. The resource is available at https://github.com/SavannahXu79/LinkPrompt.
2023
pdf
bib
abs
Counterfactual Active Learning for Out-of-Distribution Generalization
Xun Deng
|
Wenjie Wang
|
Fuli Feng
|
Hanwang Zhang
|
Xiangnan He
|
Yong Liao
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
We study the out-of-distribution generalization of active learning that adaptively selects samples for annotation in learning the decision boundary of classification. Our empirical study finds that increasingly annotating seen samples may hardly benefit the generalization. To address the problem, we propose Counterfactual Active Learning (CounterAL) that empowers active learning with counterfactual thinking to bridge the seen samples with unseen cases. In addition to annotating factual samples, CounterAL requires annotators to answer counterfactual questions to construct counterfactual samples for training. To achieve CounterAL, we design a new acquisition strategy that selects the informative factual-counterfactual pairs for annotation; and a new training strategy that pushes the model update to focus on the discrepancy between factual and counterfactual samples. We evaluate CounterAL on multiple public datasets of sentiment analysis and natural language inference. The experiment results show that CounterAL requires fewer acquisition rounds and outperforms existing active learning methods by a large margin in OOD tests with comparable IID performance.
pdf
bib
abs
Robust Prompt Optimization for Large Language Models Against Distribution Shifts
Moxin Li
|
Wenjie Wang
|
Fuli Feng
|
Yixin Cao
|
Jizhi Zhang
|
Tat-Seng Chua
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Large Language Model (LLM) has demonstrated significant ability in various Natural Language Processing tasks. However, their effectiveness is highly dependent on the phrasing of the task prompt, leading to research on automatic prompt optimization using labeled task data. We reveal that these prompt optimization techniques are vulnerable to distribution shifts such as subpopulation shifts, which are common for LLMs in real-world scenarios such as customer reviews analysis. In this light, we propose a new problem of robust prompt optimization for LLMs against distribution shifts, which requires the prompt optimized over the labeled source group can simultaneously generalize to an unlabeled target group. To solve this problem, we propose Generalized Prompt Optimization framework , which incorporates the unlabeled data from the target group into prompt optimization. Extensive experimental results demonstrate the effectiveness of the proposed framework with significant performance improvement on the target group and comparable performance on the source group.
pdf
bib
abs
Hypothetical Training for Robust Machine Reading Comprehension of Tabular Context
Moxin Li
|
Wenjie Wang
|
Fuli Feng
|
Hanwang Zhang
|
Qifan Wang
|
Tat-Seng Chua
Findings of the Association for Computational Linguistics: ACL 2023
Machine Reading Comprehension (MRC) models easily learn spurious correlations from complex contexts such as tabular data. Counterfactual training—using the factual and counterfactual data by augmentation—has become a promising solution. However, it is costly to construct faithful counterfactual examples because it is tricky to maintain the consistency and dependency of the tabular data. In this paper, we take a more efficient fashion to ask hypothetical questions like “in which year would the net profit be larger if the revenue in 2019 were $38,298?”, whose effects on the answers are equivalent to those expensive counterfactual tables. We propose a hypothetical training framework that uses paired examples with different hypothetical questions to supervise the direction of model gradient towards the counterfactual answer change. The superior generalization results on tabular MRC datasets, including a newly constructed stress test and MultiHiertt, validate our effectiveness.
pdf
bib
abs
Attack Prompt Generation for Red Teaming and Defending Large Language Models
Boyi Deng
|
Wenjie Wang
|
Fuli Feng
|
Yang Deng
|
Qifan Wang
|
Xiangnan He
Findings of the Association for Computational Linguistics: EMNLP 2023
Large language models (LLMs) are susceptible to red teaming attacks, which can induce LLMs to generate harmful content. Previous research constructs attack prompts via manual or automatic methods, which have their own limitations on construction cost and quality. To address these issues, we propose an integrated approach that combines manual and automatic methods to economically generate high-quality attack prompts. Specifically, considering the impressive capabilities of newly emerged LLMs, we propose an attack framework to instruct LLMs to mimic human-generated prompts through in-context learning. Furthermore, we propose a defense framework that fine-tunes victim LLMs through iterative interactions with the attack framework to enhance their safety against red teaming attacks. Extensive experiments on different LLMs validate the effectiveness of our proposed attack and defense frameworks. Additionally, we release a series of attack prompts datasets named SAP with varying sizes, facilitating the safety evaluation and enhancement of more LLMs.
2021
pdf
bib
abs
Certified Robustness to Word Substitution Attack with Differential Privacy
Wenjie Wang
|
Pengfei Tang
|
Jian Lou
|
Li Xiong
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
The robustness and security of natural language processing (NLP) models are significantly important in real-world applications. In the context of text classification tasks, adversarial examples can be designed by substituting words with synonyms under certain semantic and syntactic constraints, such that a well-trained model will give a wrong prediction. Therefore, it is crucial to develop techniques to provide a rigorous and provable robustness guarantee against such attacks. In this paper, we propose WordDP to achieve certified robustness against word substitution at- tacks in text classification via differential privacy (DP). We establish the connection between DP and adversarial robustness for the first time in the text domain and propose a conceptual exponential mechanism-based algorithm to formally achieve the robustness. We further present a practical simulated exponential mechanism that has efficient inference with certified robustness. We not only provide a rigorous analytic derivation of the certified condition but also experimentally compare the utility of WordDP with existing defense algorithms. The results show that WordDP achieves higher accuracy and more than 30X efficiency improvement over the state-of-the-art certified robustness mechanism in typical text classification tasks.
2020
pdf
bib
abs
Utilizing Multimodal Feature Consistency to Detect Adversarial Examples on Clinical Summaries
Wenjie Wang
|
Youngja Park
|
Taesung Lee
|
Ian Molloy
|
Pengfei Tang
|
Li Xiong
Proceedings of the 3rd Clinical Natural Language Processing Workshop
Recent studies have shown that adversarial examples can be generated by applying small perturbations to the inputs such that the well- trained deep learning models will misclassify. With the increasing number of safety and security-sensitive applications of deep learn- ing models, the robustness of deep learning models has become a crucial topic. The robustness of deep learning models for health- care applications is especially critical because the unique characteristics and the high financial interests of the medical domain make it more sensitive to adversarial attacks. Among the modalities of medical data, the clinical summaries have higher risks to be attacked because they are generated by third-party companies. As few works studied adversarial threats on clinical summaries, in this work we first apply adversarial attack to clinical summaries of electronic health records (EHR) to show the text-based deep learning systems are vulnerable to adversarial examples. Secondly, benefiting from the multi-modality of the EHR dataset, we propose a novel defense method, MATCH (Multimodal feATure Consistency cHeck), which leverages the consistency between multiple modalities in the data to defend against adversarial examples on a single modality. Our experiments demonstrate the effectiveness of MATCH on a hospital readmission prediction task comparing with baseline methods.
2018
pdf
bib
abs
Enchancing the Collaborative Interlingual Index for Digital Humanities: Cross-linguistic Analysis in the Domain of Theology
Laura Slaughter
|
Wenjie Wang
|
Luis Morgado Da Costa
|
Francis Bond
Proceedings of the 9th Global Wordnet Conference
We aim to support digital humanities work related to the study of sacred texts. To do this, we propose to build a cross-lingual wordnet within the do-main of theology. We target the Collaborative Interlingual Index (CILI) directly instead of each individual wordnet. The paper presents background for this proposal: (1) an overview of concepts relevant to theology and (2) a summary of the domain-associated issues observed in the Princeton WordNet (PWN). We have found that definitions for concepts in this domain can be too restrictive, inconsistent, and unclear. Necessary synsets are missing, with the PWN being skewed towards Christianity. We argue that tackling problems in a single domain is a better method for improving CILI. By focusing on a single topic rather than a single language, this will result in the proper construction of definitions, romanization/translation of lemmas, and also improvements in use of/creation of a cross-lingual domain hierarchy.