Proceedings of the ART of Safety: Workshop on Adversarial testing and Red-Teaming for generative AI

Alicia Parrish (Editor)


Anthology ID:
2023.artofsafety-1
Month:
November
Year:
2023
Address:
Bali, Indonesia
Venues:
artofsafety | WS
SIG:
Publisher:
Association for Computational Linguistics
URL:
https://preview.aclanthology.org/build-pipeline-with-new-library/2023.artofsafety-1/
DOI:
Bib Export formats:
BibTeX

pdf bib
Proceedings of the ART of Safety: Workshop on Adversarial testing and Red-Teaming for generative AI
Alicia Parrish

pdf bib
Red Teaming for Large Language Models At Scale: Tackling Hallucinations on Mathematics Tasks
Aleksander Buszydlik | Karol Dobiczek | Michał Teodor Okoń | Konrad Skublicki | Philip Lippmann | Jie Yang

pdf bib
Student-Teacher Prompting for Red Teaming to Improve Guardrails
Rodrigo Revilla Llaca | Victoria Leskoschek | Vitor Costa Paiva | Cătălin Lupău | Philip Lippmann | Jie Yang

pdf
Distilling Adversarial Prompts from Safety Benchmarks: Report for the Adversarial Nibbler Challenge
Manuel Brack | Patrick Schramowski | Kristian Kersting

pdf
Measuring Adversarial Datasets
Yuanchen Bai | Raoyi Huang | Vijay Viswanathan | Tzu-Sheng Kuo | Tongshuang Wu

pdf
Discovering Safety Issues in Text-to-Image Models: Insights from Adversarial Nibbler Challenge
Gauri Sharma

pdf
Uncovering Bias in AI-Generated Images
Kimberley Baxter