Siqi Wang

Other people with similar names: Siqi Wang

Unverified author pages with similar names: Siqi Wang


Fixing paper assignments

  1. Please select all papers that do not belong to this person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2024

pdf bib
CoSafe: Evaluating Large Language Model Safety in Multi-Turn Dialogue Coreference
Erxin Yu | Jing Li | Ming Liao | Siqi Wang | Gao Zuchen | Fei Mi | Lanqing Hong
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

As large language models (LLMs) constantly evolve, ensuring their safety remains a critical research issue. Previous red teaming approaches for LLM safety have primarily focused on single prompt attacks or goal hijacking. To the best of our knowledge, we are the first to study LLM safety in multi-turn dialogue coreference. We created a dataset of 1,400 questions across 14 categories, each featuring multi-turn coreference safety attacks. We then conducted detailed evaluations on five widely used open-source LLMs. The results indicated that under multi-turn coreference safety attacks, the highest attack success rate was 56% with the LLaMA2-Chat-7b model, while the lowest was 13.9% with the Mistral-7B-Instruct model. These findings highlight the safety vulnerabilities in LLMs during dialogue coreference interactions.