MIL-Decoding: Detoxifying Language Models at Token-Level via Multiple Instance Learning

Xu Zhang; Xiaojun Wan

doi:10.18653/v1/2023.acl-long.11

MIL-Decoding: Detoxifying Language Models at Token-Level via Multiple Instance Learning

Abstract

Despite advances in large pre-trained neural language models, they are prone to generating toxic language, which brings security risks to their applications. We introduce MIL-Decoding, which detoxifies language models at token-level by interpolating it with a trained multiple instance learning (MIL) network.MIL model is trained on a corpus with a toxicity label for each text to predict the overall toxicity and the toxicity of each token in its context. Intuitively, the MIL network computes a toxicity distribution over next tokens according to the generated context which supplements the original language model to avoid toxicity. We evaluate MIL-Decoding with automatic metrics and human evaluation, where MIL-Decoding outperforms other baselines in detoxification while it only hurts generation fluency a little bit.

Anthology ID:: 2023.acl-long.11
Volume:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 190–202
Language:
URL:: https://aclanthology.org/2023.acl-long.11
DOI:: 10.18653/v1/2023.acl-long.11
Bibkey:
Cite (ACL):: Xu Zhang and Xiaojun Wan. 2023. MIL-Decoding: Detoxifying Language Models at Token-Level via Multiple Instance Learning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 190–202, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: MIL-Decoding: Detoxifying Language Models at Token-Level via Multiple Instance Learning (Zhang & Wan, ACL 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-2/2023.acl-long.11.pdf
Video:: https://preview.aclanthology.org/nschneid-patch-2/2023.acl-long.11.mp4

PDF Search Video