Language Model Detoxification in Dialogue with Contextualized Stance Control

Jing Qian; Xifeng Yan

doi:10.18653/v1/2022.findings-emnlp.406

Language Model Detoxification in Dialogue with Contextualized Stance Control

Abstract

To reduce the toxic degeneration in a pretrained Language Model (LM), previous work on Language Model detoxification has focused on reducing the toxicity of the generation itself (self-toxicity) without consideration of the context. As a result, a type of implicit offensive language where the generations support the offensive language in the context is ignored. Different from the LM controlling tasks in previous work, where the desired attributes are fixed for generation, the desired stance of the generation depends on the offensiveness of the context. Therefore, we propose a novel control method to do context-dependent detoxification with the stance taken into consideration. We introduce meta prefixes to learn the contextualized stance control strategy and to generate the stance control prefix according to the input context. The generated stance prefix is then combined with the toxicity control prefix to guide the response generation. Experimental results show that our proposed method can effectively learn the context-dependent stance control strategies while keeping a low self-toxicity of the underlying LM.

Anthology ID:: 2022.findings-emnlp.406
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2022
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5548–5558
Language:
URL:: https://aclanthology.org/2022.findings-emnlp.406
DOI:: 10.18653/v1/2022.findings-emnlp.406
Bibkey:
Cite (ACL):: Jing Qian and Xifeng Yan. 2022. Language Model Detoxification in Dialogue with Contextualized Stance Control. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 5548–5558, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: Language Model Detoxification in Dialogue with Contextualized Stance Control (Qian & Yan, Findings 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/naacl-24-ws-corrections/2022.findings-emnlp.406.pdf
Software:: 2022.findings-emnlp.406.software.zip

PDF Search Software