Jing Wang

2025

Direct Prompt Injection (DPI) attacks pose a critical security threat to Large Language Models (LLMs) due to their low barrier of execution and high potential damage. To address the impracticality of existing white-box/gray-box methods and the poor transferability of black-box methods, we propose an activations-guided prompt injection attack framework. We first construct an Energy-based Model (EBM) using activations from a surrogate model to evaluate the quality of adversarial prompts. Guided by the trained EBM, we employ the token-level Markov Chain Monte Carlo (MCMC) sampling to adaptively optimize adversarial prompts, thereby enabling gradient-free black-box attacks. Experimental results demonstrate our superior cross-model transferability, achieving 49.6% attack success rate (ASR) across five mainstream LLMs and 34.6% improvement over human-crafted prompts, and maintaining 36.6% ASR on unseen task scenarios. Interpretability analysis reveals a correlation between activations and attack effectiveness, highlighting the critical role of semantic patterns in transferable vulnerability exploitation.

Existing research in multi-hop questions has identified two reasoning modes: latent reasoning and factual shortcuts, but has not deeply investigated how these modes differ during inference. This impacts both model generalization ability and downstream reasoning tasks. In this work, we systematically examine these distinctions and propose a simple and efficient classification metric, Attribute Rate Ratio (ARR). First, we construct specialized datasets corresponding to the two reasoning modes based on our proposed criteria. Then, using reverse engineering methods, including attention knockout and logit lens techniques, we reveal that subject representations differ significantly across modes: latent reasoning encodes bridge-related information for final answer extraction, while factual shortcuts bypass intermediate reasoning and resemble single-hop factual queries. Finally, our proposed ARR achieves around 90% accuracy on our datasets and demonstrates effectiveness in RAG conflict scenarios, showing that model behavior under conflicting prompts is closely tied to its underlying reasoning mode. Our findings and proposed metric have significant potential for advancing LLM development and applications.

Large language models (LLMs) generate human-aligned content under certain safety constraints. However, the current known technique “jailbreak prompt” can circumvent safety-aligned measures and induce LLMs to output malicious content. Research on Jailbreaking can help identify vulnerabilities in LLMs and guide the development of robust security frameworks. To circumvent the issue of attack templates becoming obsolete as models evolve, existing methods adopt iterative mutation and dynamic optimization to facilitate more automated jailbreak attacks. However, these methods face two challenges: inefficiency and repetitive optimization, as they overlook the value of past attack experiences. To better integrate past attack experiences to assist current jailbreak attempts, we propose the JailExpert, an automated jailbreak framework, which is the first to achieve a formal representation of experience structure, group experiences based on semantic drift, and support the dynamic updating of the experience pool. Extensive experiments demonstrate that JailExpert significantly improves both attack effectiveness and efficiency. Compared to the current state-of-the-art black-box jailbreak method, JailExpert achieves an average increase of 24% in attack success rate and 2.7 times improvement in attack efficiency.

pdf bib abs
SynC-LLM: Generation of Large-Scale Synthetic Circuit Code with Hierarchical Language Models
Shang Liu | Yao Lu | Wenji Fang | Jing Wang | Zhiyao Xie
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

In recent years, AI-assisted integrated circuit (IC) design methods have shown great potential in boosting IC design efficiency. However, this emerging technique is fundamentally limited by the serious scarcity of publicly accessible large-scale circuit design data, which are mostly private IPs owned by semiconductor companies. In this work, we propose SynC-LLM, the first technique that exploits LLM’s ability to generate new large-scale synthetic digital circuits. In our hierarchical circuit generation process, we first design a directed graph diffusion model to learn and generate the skeleton of large circuits with sequential cells. Then we propose a cone function retrieval technique to annotate each sequential node in the skeleton with a function description. Finally, we apply a level-by-level customized prompting technique utilizing LLM to complete the code at every skeleton cone. Experiments show that our generated circuits are not only valid and fully functional, but also closely resemble realistic large-scale designs and can significantly improve AI models’ performance in multiple IC design tasks. The code and data are open-sourced in https://github.com/hkust-zhiyao/SynCircuitData.

Retrieval-augmented generation (RAG) has emerged as a promising solution for mitigating hallucinations of large language models (LLMs) with retrieved external knowledge. Adaptive RAG enhances this approach by enabling dynamic retrieval during generation, activating retrieval only when the query exceeds LLM’s internal knowledge. Existing methods primarily focus on detecting LLM’s confidence via statistical uncertainty. Instead, we present the first attempts to solve adaptive RAG from a representation perspective and develop an inherent control-based framework, termed CtrlA. Specifically, we extract the features that represent the honesty and confidence directions of LLM and adopt them to control LLM behavior and guide retrieval timing decisions. We also design a simple yet effective query formulation strategy to support adaptive retrieval. Experiments show that CtrlA is superior to existing adaptive RAG methods on a diverse set of tasks. Honesty steering can effectively make LLMs more honest and confidence monitoring is a promising indicator of retrieval trigger.

pdf bib abs
Evaluating and Mitigating Object Hallucination in Large Vision-Language Models: Can They Still See Removed Objects?
Yixiao He | Haifeng Sun | Pengfei Ren | Jingyu Wang | Huazheng Wang | Qi Qi | Zirui Zhuang | Jing Wang
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Large Vision-Language Models (LVLMs) have a significant issue with object hallucinations, where researchers have noted that LVLMs often mistakenly determine objects as present in images where they do not actually exist. Some recent studies evaluate the occurrence of object hallucinations by asking LVLMs whether they see objects that do not exist in input images. However, we observe that these evaluation methods have some limitations, such as the objects being questioned potentially having little relevance to the image. In this paper, we introduce a more challenging benchmark for evaluating object hallucinations by removing objects from images and then asking the model whether it can still see the removed objects. Our evaluation result reveals that LVLMs suffer from severe hallucinations, as they often still claim to see the removed objects. Through our analysis, we find that biases in training result in LVLMs lacking guidance on learning about the absence of objects, which in turn leads to a lack of ability to determine that objects do not exist in images. To address this issue, we further propose oDPO, a direct preference optimization objective based on visual objects. By guiding LVLMs to learn to determine the existence of objects, oDPO effectively alleviates object hallucinations. It achieves more competitive results than other hallucination mitigation approaches across multiple object hallucination benchmarks and enhances the performance of LVLMs in various vision-language tasks.

2024

pdf bib abs
Unsupervised Contrast-Consistent Ranking with Language Models
Niklas Stoehr | Pengxiang Cheng | Jing Wang | Daniel Preotiuc-Pietro | Rajarshi Bhowmik
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Language models contain ranking-based knowledge and are powerful solvers of in-context ranking tasks. For instance, they may have parametric knowledge about the ordering of countries by size or may be able to rank product reviews by sentiment. We compare pairwise, pointwise and listwise prompting techniques to elicit a language model’s ranking knowledge. However, we find that even with careful calibration and constrained decoding, prompting-based techniques may not always be self-consistent in the rankings they produce. This motivates us to explore an alternative approach that is inspired by an unsupervised probing method called Contrast-Consistent Search (CCS). The idea is to train a probe guided by a logical constraint: a language model’s representation of a statement and its negation must be mapped to contrastive true-false poles consistently across multiple statements. We hypothesize that similar constraints apply to ranking tasks where all items are related via consistent, pairwise or listwise comparisons. To this end, we extend the binary CCS method to Contrast-Consistent Ranking (CCR) by adapting existing ranking methods such as the Max-Margin Loss, Triplet Loss and an Ordinal Regression objective. Across different models and datasets, our results confirm that CCR probing performs better or, at least, on a par with prompting.

pdf bib abs
HPipe: Large Language Model Pipeline Parallelism for Long Context on Heterogeneous Cost-effective Devices
Ruilong Ma | Xiang Yang | Jingyu Wang | Qi Qi | Haifeng Sun | Jing Wang | Zirui Zhuang | Jianxin Liao
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track)

Micro-enterprises and individual developers emerge analysis demands for long sequence with powerful Large Language Models (LLMs). They try to deploy the LLMs at local, but only possess various commodity devices and the unreliable interconnection between devices. Existing parallel techniques do not lead to the same effectiveness in limited environment. The heterogeneity of devices, coupled with their limited capacity and expensive communication, brings challenges to private deployment for maximized utilization of available devices while masking latency. Hence, we introduce HPipe, a pipeline inference framework that successfully mitigates LLMs from high-performance clusters to heterogeneous commodity devices. By ensuring a balanced distribution of workloads, HPipe facilitates the parallel execution of LLMs through pipelining the sequences on the token dimension. The evaluation conducted on LLaMA-7B and GPT3-2B demonstrates that HPipe holds the potential for context analysis on LLM with heterogeneity devices, achieving an impressive speedup in latency and throughput up to 2.28 times.

2023

pdf bib
Analyzing and Predicting Persistence of News Tweets
Maggie Liu | Jing Wang | Daniel Preotiuc-Pietro
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

2022

pdf bib abs
Modeling Aspect Correlation for Aspect-based Sentiment Analysis via Recurrent Inverse Learning Guidance
Longfeng Li | Haifeng Sun | Qi Qi | Jingyu Wang | Jing Wang | Jianxin Liao
Proceedings of the 29th International Conference on Computational Linguistics

Aspect-based sentiment analysis (ABSA) aims to distinguish sentiment polarity of every specific aspect in a given sentence. Previous researches have realized the importance of interactive learning with context and aspects. However, these methods are ill-studied to learn complex sentence with multiple aspects due to overlapped polarity feature. And they do not consider the correlation between aspects to distinguish overlapped feature. In order to solve this problem, we propose a new method called Recurrent Inverse Learning Guided Network (RILGNet). Our RILGNet has two points to improve the modeling of aspect correlation and the selecting of aspect feature. First, we use Recurrent Mechanism to improve the joint representation of aspects, which enhances the aspect correlation modeling iteratively. Second, we propose Inverse Learning Guidance to improve the selection of aspect feature by considering aspect correlation, which provides more useful information to determine polarity. Experimental results on SemEval 2014 Datasets demonstrate the effectiveness of RILGNet, and we further prove that RILGNet is state-of-the-art method in multiaspect scenarios.

2020

pdf bib abs
Multi-Domain Named Entity Recognition with Genre-Aware and Agnostic Inference
Jing Wang | Mayank Kulkarni | Daniel Preotiuc-Pietro
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Named entity recognition is a key component of many text processing pipelines and it is thus essential for this component to be robust to different types of input. However, domain transfer of NER models with data from multiple genres has not been widely studied. To this end, we conduct NER experiments in three predictive setups on data from: a) multiple domains; b) multiple domains where the genre label is unknown at inference time; c) domains not encountered in training. We introduce a new architecture tailored to this task by using shared and private domain parameters and multi-task learning. This consistently outperforms all other baseline and competitive methods on all three experimental setups, with differences ranging between +1.95 to +3.11 average F1 across multiple genres when compared to standard approaches. These results illustrate the challenges that need to be taken into account when building real-world NLP applications that are robust to various types of text and the methods that can help, at least partially, alleviate these issues.

2015

pdf bib abs
A Sense-Topic Model for Word Sense Induction with Unsupervised Data Enrichment
Jing Wang | Mohit Bansal | Kevin Gimpel | Brian D. Ziebart | Clement T. Yu
Transactions of the Association for Computational Linguistics, Volume 3

Word sense induction (WSI) seeks to automatically discover the senses of a word in a corpus via unsupervised methods. We propose a sense-topic model for WSI, which treats sense and topic as two separate latent variables to be inferred jointly. Topics are informed by the entire document, while senses are informed by the local context surrounding the ambiguous word. We also discuss unsupervised ways of enriching the original corpus in order to improve model performance, including using neural word embeddings and external corpora to expand the context of each data instance. We demonstrate significant improvements over the previous state-of-the-art, achieving the best results reported to date on the SemEval-2013 WSI task.