Xinyi Wu
2026
Rethinking Reasoning: A Survey on Reasoning-based Backdoors in LLMs
Man Hu | Xinyi Wu | Zhufeng Suo | Jinbo Feng | Linghui Meng | Yanhao Jia | Anh Tuan Luu | Shuai Zhao
Findings of the Association for Computational Linguistics: ACL 2026
Man Hu | Xinyi Wu | Zhufeng Suo | Jinbo Feng | Linghui Meng | Yanhao Jia | Anh Tuan Luu | Shuai Zhao
Findings of the Association for Computational Linguistics: ACL 2026
With the rise of advanced reasoning capabilities, large language models (LLMs) are receiving increasing attention. While reasoning enhances LLMs’ performance on downstream tasks, it also introduces new threat vectors, as adversaries can leverage these capabilities to conduct backdoor attacks. Prior surveys provide broad overviews of backdoor attacks and reasoning security; however, a systematic survey focused on backdoor attacks and defenses against LLM reasoning is still absent. In this paper, we take the first step toward providing a comprehensive review of reasoning-based backdoor attacks in LLMs by analyzing their underlying mechanisms, methodological frameworks, and unresolved challenges. Specifically, we introduce a new taxonomy that offers a unified perspective for summarizing existing approaches, categorizing reasoning-based backdoor attacks into associative, passive, and active. We also summarize defenses against such attacks and discuss current challenges alongside future research directions.
P2P: A Poison-to-Poison Remedy for Reliable Backdoor Defense in LLMs
Shuai Zhao | Xinyi Wu | Shiqian Zhao | Xiaobao Wu | Zhongliang Guo | Yanhao Jia | Anh Tuan Luu
Findings of the Association for Computational Linguistics: ACL 2026
Shuai Zhao | Xinyi Wu | Shiqian Zhao | Xiaobao Wu | Zhongliang Guo | Yanhao Jia | Anh Tuan Luu
Findings of the Association for Computational Linguistics: ACL 2026
Defending Large Language Models (LLMs) against backdoor attacks has long been trapped in a "cat-and-mouse" dilemma, where defenders passively react to ever-shifting attack strategies. To break this cycle, we posit that proactive immunization is inherently superior to reactive sanitization. In this study, we propose Poison-to-Poison (P2P), a general and effective defense algorithm that introduces a paradigm shift. Instead of waiting to detect malicious samples, P2P strategically implants benign triggers to reshape the model’s decision boundary, redirecting latent feature activation from malicious trajectories to a safe, controllable output space. This enforces the model to associate trigger-induced representations with safe outputs, thereby overriding the effects of original malicious triggers. Thanks to this robust and generalizable trigger-based fine-tuning, P2P is effective across task settings and attack types. Theoretically and empirically, we show that P2P can neutralize malicious backdoors while preserving task performance. We conduct extensive experiments on classification, mathematical reasoning, and summary generation tasks, involving multiple state-of-the-art LLMs. The results demonstrate that our P2P algorithm significantly reduces the attack success rate compared with baseline models. We hope that P2P can serve as a practical guideline for defending against backdoor attacks in the Model as a Service (MaaS) scenario, where benign prompts are embedded within the system to regulate model behavior.
2025
Uni-Retrieval: A Multi-Style Retrieval Framework for STEM’s Education
Yanhao Jia | Xinyi Wu | Li Hao | QinglinZhang QinglinZhang | Yuxiao Hu | Shuai Zhao | Wenqi Fan
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yanhao Jia | Xinyi Wu | Li Hao | QinglinZhang QinglinZhang | Yuxiao Hu | Shuai Zhao | Wenqi Fan
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
In AI-facilitated teaching, leveraging various query styles to interpret abstract text descriptions is crucial for ensuring high-quality teaching. However, current retrieval models primarily focus on natural text-image retrieval, making them insufficiently tailored to educational scenarios due to the ambiguities in the retrieval process. In this paper, we propose a diverse expression retrieval task tailored to educational scenarios, supporting retrieval based on multiple query styles and expressions. We introduce the STEM Education Retrieval Dataset (SER), which contains over 24,000 query pairs of different styles, and the Uni-Retrieval, an efficient and style-diversified retrieval vision-language model based on prompt tuning. Uni-Retrieval extracts query style features as prototypes and builds a continuously updated Prompt Bank containing prompt tokens for diverse queries. This bank can updated during test time to represent domain-specific knowledge for different subject retrieval scenarios. Our framework demonstrates scalability and robustness by dynamically retrieving prompt tokens based on prototype similarity, effectively facilitating learning for unknown queries. Experimental results indicate that Uni-Retrieval outperforms existing retrieval models in most retrieval tasks.
2021
Explaining Relationships Between Scientific Documents
Kelvin Luu | Xinyi Wu | Rik Koncel-Kedziorski | Kyle Lo | Isabel Cachola | Noah A. Smith
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Kelvin Luu | Xinyi Wu | Rik Koncel-Kedziorski | Kyle Lo | Isabel Cachola | Noah A. Smith
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
We address the task of explaining relationships between two scientific documents using natural language text. This task requires modeling the complex content of long technical documents, deducing a relationship between these documents, and expressing the details of that relationship in text. In addition to the theoretical interest of this task, successful solutions can help improve researcher efficiency in search and review. In this paper we establish a dataset of 622K examples from 154K documents. We pretrain a large language model to serve as the foundation for autoregressive approaches to the task. We explore the impact of taking different views on the two documents, including the use of dense representations extracted with scientific IE systems. We provide extensive automatic and human evaluations which show the promise of such models, but make clear challenges for future work.
2020
Linguistically-Informed Transformations (LIT): A Method for Automatically Generating Contrast Sets
Chuanrong Li | Lin Shengshuo | Zeyu Liu | Xinyi Wu | Xuhui Zhou | Shane Steinert-Threlkeld
Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP
Chuanrong Li | Lin Shengshuo | Zeyu Liu | Xinyi Wu | Xuhui Zhou | Shane Steinert-Threlkeld
Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP
Although large-scale pretrained language models, such as BERT and RoBERTa, have achieved superhuman performance on in-distribution test sets, their performance suffers on out-of-distribution test sets (e.g., on contrast sets). Building contrast sets often requires human-expert annotation, which is expensive and hard to create on a large scale. In this work, we propose a Linguistically-Informed Transformation (LIT) method to automatically generate contrast sets, which enables practitioners to explore linguistic phenomena of interests as well as compose different phenomena. Experimenting with our method on SNLI and MNLI shows that current pretrained language models, although being claimed to contain sufficient linguistic knowledge, struggle on our automatically generated contrast sets. Furthermore, we improve models’ performance on the contrast sets by applying LIT to augment the training data, without affecting performance on the original data.
2019
What Makes a Good Counselor? Learning to Distinguish between High-quality and Low-quality Counseling Conversations
Verónica Pérez-Rosas | Xinyi Wu | Kenneth Resnicow | Rada Mihalcea
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Verónica Pérez-Rosas | Xinyi Wu | Kenneth Resnicow | Rada Mihalcea
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
The quality of a counseling intervention relies highly on the active collaboration between clients and counselors. In this paper, we explore several linguistic aspects of the collaboration process occurring during counseling conversations. Specifically, we address the differences between high-quality and low-quality counseling. Our approach examines participants’ turn-by-turn interaction, their linguistic alignment, the sentiment expressed by speakers during the conversation, as well as the different topics being discussed. Our results suggest important language differences in low- and high-quality counseling, which we further use to derive linguistic features able to capture the differences between the two groups. These features are then used to build automatic classifiers that can predict counseling quality with accuracies of up to 88%.
Search
Fix author
Co-authors
- Yanhao Jia 3
- Shuai Zhao 3
- Luu Anh Tuan 2
- Isabel Cachola 1
- Wenqi Fan 1
- Jinbo Feng 1
- Zhongliang Guo 1
- Li Hao 1
- Man Hu 1
- Yuxiao Hu 1
- Rik Koncel-Kedziorski 1
- Chuanrong Li 1
- Zeyu Liu 1
- Kyle Lo 1
- Kelvin Luu 1
- Linghui Meng 1
- Rada Mihalcea 1
- Verónica Pérez-Rosas 1
- QinglinZhang QinglinZhang 1
- Kenneth Resnicow 1
- Lin Shengshuo 1
- Noah A. Smith 1
- Shane Steinert-Threlkeld 1
- Zhufeng Suo 1
- Xiaobao Wu 1
- Shiqian Zhao 1
- Xuhui Zhou 1