Using In-Context Learning to Improve Dialogue Safety
Nicholas Meade, Spandana Gella, Devamanyu Hazarika, Prakhar Gupta, Di Jin, Siva Reddy, Yang Liu, Dilek Hakkani-Tur
Abstract
While large neural-based conversational models have become increasingly proficient dialogue agents, recent work has highlighted safety issues with these systems. For example, these systems can be goaded into generating toxic content, often perpetuating social biases or stereotypes. We investigate a retrieval-based approach for reducing bias and toxicity in responses from chatbots. It uses in-context learning to steer a model towards safer generations. Concretely, to generate a response to an unsafe dialogue context, we retrieve demonstrations of safe responses to similar dialogue contexts. We find our method performs competitively with existing approaches to dialogue safety without requiring training. We also show, using automatic and human evaluation, that reductions in toxicity obtained using our approach are not at the cost engagingness or coherency. Finally, we note our method can be used in compliment to existing dialogue safety approaches, such as RLHF.- Anthology ID:
- 2023.findings-emnlp.796
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2023
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 11882–11910
- Language:
- URL:
- https://aclanthology.org/2023.findings-emnlp.796
- DOI:
- 10.18653/v1/2023.findings-emnlp.796
- Cite (ACL):
- Nicholas Meade, Spandana Gella, Devamanyu Hazarika, Prakhar Gupta, Di Jin, Siva Reddy, Yang Liu, and Dilek Hakkani-Tur. 2023. Using In-Context Learning to Improve Dialogue Safety. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 11882–11910, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Using In-Context Learning to Improve Dialogue Safety (Meade et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/add_acl24_videos/2023.findings-emnlp.796.pdf