Proceedings of the ART of Safety: Workshop on Adversarial testing and Red-Teaming for generative AI - ACL Anthology

This is an internal, incomplete preview of a proposed change to the ACL Anthology. For efficiency reasons, we generate only three BibTeX files per volume, and the preview may be incomplete in other ways, or contain mistakes. Do not treat this content as an official publication.

Proceedings of the ART of Safety: Workshop on Adversarial testing and Red-Teaming for generative AI

Alicia Parrish (Editor)

Anthology ID:: 2023.artofsafety-1
Month:: November
Year:: 2023
Address:: Bali, Indonesia
Venues:: artofsafety | WS
SIG:
Publisher:: Association for Computational Linguistics
URL:: https://preview.aclanthology.org/build-pipeline-with-new-library/2023.artofsafety-1/
DOI:
Bib Export formats:: BibTeX

pdf bib
Proceedings of the ART of Safety: Workshop on Adversarial testing and Red-Teaming for generative AI
Alicia Parrish

pdf bib
Red Teaming for Large Language Models At Scale: Tackling Hallucinations on Mathematics Tasks
Aleksander Buszydlik | Karol Dobiczek | Michał Teodor Okoń | Konrad Skublicki | Philip Lippmann | Jie Yang

pdf bib
Student-Teacher Prompting for Red Teaming to Improve Guardrails
Rodrigo Revilla Llaca | Victoria Leskoschek | Vitor Costa Paiva | Cătălin Lupău | Philip Lippmann | Jie Yang

pdf
Distilling Adversarial Prompts from Safety Benchmarks: Report for the Adversarial Nibbler Challenge
Manuel Brack | Patrick Schramowski | Kristian Kersting

pdf
Measuring Adversarial Datasets
Yuanchen Bai | Raoyi Huang | Vijay Viswanathan | Tzu-Sheng Kuo | Tongshuang Wu

pdf
Discovering Safety Issues in Text-to-Image Models: Insights from Adversarial Nibbler Challenge
Gauri Sharma

pdf
Uncovering Bias in AI-Generated Images
Kimberley Baxter