Rongwu Xu
2024
Preemptive Answer “Attacks” on Chain-of-Thought Reasoning
Rongwu Xu
|
Zehan Qi
|
Wei Xu
Findings of the Association for Computational Linguistics ACL 2024
Large language models (LLMs) showcase impressive reasoning capabilities when coupled with Chain-of-Thought (CoT) prompting. However, the robustness of this approach warrants further investigation. In this paper, we introduce a novel scenario termed preemptive answers, where the LLM obtains an answer before engaging in reasoning. This situation can arise inadvertently or induced by malicious users by prompt injection attacks. Experiments reveal that preemptive answers significantly impair the model’s reasoning capability across various CoT methods and a broad spectrum of datasets. To bolster the robustness of reasoning, we propose two measures aimed at mitigating this issue to some extent.
The Earth is Flat because...: Investigating LLMs’ Belief towards Misinformation via Persuasive Conversation
Rongwu Xu
|
Brian Lin
|
Shujian Yang
|
Tianqi Zhang
|
Weiyan Shi
|
Tianwei Zhang
|
Zhixuan Fang
|
Wei Xu
|
Han Qiu
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) encapsulate vast amounts of knowledge but still remain vulnerable to external misinformation. Existing research mainly studied this susceptibility behavior in a single-turn setting. However, belief can change during a multi-turn conversation, especially a persuasive one. Therefore, in this study, we delve into LLMs’ susceptibility to persuasive conversations, particularly on factual questions that they can answer correctly. We first curate the Farm (i.e., Fact to Misinform) dataset, which contains factual questions paired with systematically generated persuasive misinformation. Then, we develop a testing framework to track LLMs’ belief changes in a persuasive dialogue. Through extensive experiments, we find that LLMs’ correct beliefs on factual knowledge can be easily manipulated by various persuasive strategies.
Search
Co-authors
- Wei Xu 2
- Zehan Qi 1
- Brian Lin 1
- Shujian Yang 1
- Tianqi Zhang 1
- show all...