Proceedings of the ART of Safety: Workshop on Adversarial testing and Red-Teaming for generative AI - ACL Anthology

This is an internal, incomplete preview of a proposed change to the ACL Anthology. For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes. Do not treat this content as an official publication.

Proceedings of the ART of Safety: Workshop on Adversarial testing and Red-Teaming for generative AI

Alicia Parrish (Editor)

Anthology ID:: 2023.artofsafety-1
Month:: November
Year:: 2023
Address:: Bali, Indonesia
Venues:: artofsafety | WS
SIG:
Publisher:: Association for Computational Linguistics
URL:: https://preview.aclanthology.org/bootstrap-5/2023.artofsafety-1/
DOI:
Bib Export formats:: BibTeX

Proceedings of the ART of Safety: Workshop on Adversarial testing and Red-Teaming for generative AI
Alicia Parrish

Red Teaming for Large Language Models At Scale: Tackling Hallucinations on Mathematics Tasks
Aleksander Buszydlik | Karol Dobiczek | Michał Teodor Okoń | Konrad Skublicki | Philip Lippmann | Jie Yang

Student-Teacher Prompting for Red Teaming to Improve Guardrails
Rodrigo Revilla Llaca | Victoria Leskoschek | Vitor Costa Paiva | Cătălin Lupău | Philip Lippmann | Jie Yang

Distilling Adversarial Prompts from Safety Benchmarks: Report for the Adversarial Nibbler Challenge
Manuel Brack | Patrick Schramowski | Kristian Kersting

Measuring Adversarial Datasets
Yuanchen Bai | Raoyi Huang | Vijay Viswanathan | Tzu-Sheng Kuo | Tongshuang Wu

Discovering Safety Issues in Text-to-Image Models: Insights from Adversarial Nibbler Challenge
Gauri Sharma

Uncovering Bias in AI-Generated Images
Kimberley Baxter