Ece Kamar
2022
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection
Thomas Hartvigsen
|
Saadia Gabriel
|
Hamid Palangi
|
Maarten Sap
|
Dipankar Ray
|
Ece Kamar
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Toxic language detection systems often falsely flag text that contains minority group mentions as toxic, as those groups are often the targets of online hate. Such over-reliance on spurious correlations also causes systems to struggle with detecting implicitly toxic language.To help mitigate these issues, we create ToxiGen, a new large-scale and machine-generated dataset of 274k toxic and benign statements about 13 minority groups. We develop a demonstration-based prompting framework and an adversarial classifier-in-the-loop decoding method to generate subtly toxic and benign text with a massive pretrained language model. Controlling machine generation in this way allows ToxiGen to cover implicitly toxic text at a larger scale, and about more demographic groups, than previous resources of human-written text. We conduct a human evaluation on a challenging subset of ToxiGen and find that annotators struggle to distinguish machine-generated text from human-written language. We also find that 94.5% of toxic examples are labeled as hate speech by human annotators. Using three publicly-available datasets, we show that finetuning a toxicity classifier on our data improves its performance on human-written data substantially. We also demonstrate that ToxiGen can be used to fight machine-generated toxicity as finetuning improves the classifier significantly on our evaluation subset.
2014
Crowdsourcing Language Generation Templates for Dialogue Systems
Margaret Mitchell
|
Dan Bohus
|
Ece Kamar
Proceedings of the INLG and SIGDIAL 2014 Joint Session
2012
Towards Situated Collaboration
Dan Bohus
|
Ece Kamar
|
Eric Horvitz
NAACL-HLT Workshop on Future directions and needs in the Spoken Dialog Community: Tools and Data (SDCTD 2012)
Search
Co-authors
- Dan Bohus 2
- Thomas Hartvigsen 1
- Saadia Gabriel 1
- Hamid Palangi 1
- Maarten Sap 1
- show all...