Improving Covert Toxicity Detection by Retrieving and Generating References

Dong-Ho Lee, Hyundong Cho, Woojeong Jin, Jihyung Moon, Sungjoon Park, Paul Röttger, Jay Pujara, Roy Ka-wei Lee


Abstract
Models for detecting toxic content play an important role in keeping people safe online. There has been much progress in detecting overt toxicity. Covert toxicity, however, remains a challenge because its detection requires an understanding of implicit meaning and subtle connotations. In this paper, we explore the potential of leveraging references, such as external knowledge and textual interpretations, to enhance the detection of covert toxicity. We run experiments on two covert toxicity datasets with two types of references: 1) information retrieved from a search API, and 2) interpretations generated by large language models. We find that both types of references improve detection, with the latter being more useful than the former. We also find that generating interpretations grounded on properties of covert toxicity, such as humor and irony, lead to the largest improvements
Anthology ID:
2024.woah-1.21
Volume:
Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH 2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Yi-Ling Chung, Zeerak Talat, Debora Nozza, Flor Miriam Plaza-del-Arco, Paul Röttger, Aida Mostafazadeh Davani, Agostina Calabrese
Venues:
WOAH | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
266–274
Language:
URL:
https://aclanthology.org/2024.woah-1.21
DOI:
10.18653/v1/2024.woah-1.21
Bibkey:
Cite (ACL):
Dong-Ho Lee, Hyundong Cho, Woojeong Jin, Jihyung Moon, Sungjoon Park, Paul Röttger, Jay Pujara, and Roy Ka-wei Lee. 2024. Improving Covert Toxicity Detection by Retrieving and Generating References. In Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH 2024), pages 266–274, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Improving Covert Toxicity Detection by Retrieving and Generating References (Lee et al., WOAH-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2024.woah-1.21.pdf