Attribute Controlled Fine-tuning for Large Language Models: A Case Study on Detoxification
Tao Meng, Ninareh Mehrabi, Palash Goyal, Anil Ramakrishna, Aram Galstyan, Richard Zemel, Kai-Wei Chang, Rahul Gupta, Charith Peris
Abstract
We propose a constraint learning schema forfine-tuning Large Language Models (LLMs)with attribute control. Given a training corpusand control criteria formulated as a sequence-level constraint on model outputs, our methodfine-tunes the LLM on the training corpus whileenhancing constraint satisfaction with minimalimpact on its utility and generation quality.Specifically, our approach regularizes the LLMtraining by penalizing the KL divergence be-tween the desired output distribution, which sat-isfies the constraints, and the LLM’s posterior.This regularization term can be approximatedby an auxiliary model trained to decomposethe sequence-level constraints into token-levelguidance, allowing the term to be measuredby a closed-form formulation. To further im-prove efficiency, we design a parallel schemefor concurrently updating both the LLM andthe auxiliary model. We evaluate the empiricalperformance of our approach by controlling thetoxicity when training an LLM. We show thatour approach leads to an LLM that producesfewer inappropriate responses while achievingcompetitive performance on benchmarks and atoxicity detection task- Anthology ID:
- 2024.findings-emnlp.779
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2024
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 13329–13341
- Language:
- URL:
- https://preview.aclanthology.org/ingest_wac_2008/2024.findings-emnlp.779/
- DOI:
- 10.18653/v1/2024.findings-emnlp.779
- Cite (ACL):
- Tao Meng, Ninareh Mehrabi, Palash Goyal, Anil Ramakrishna, Aram Galstyan, Richard Zemel, Kai-Wei Chang, Rahul Gupta, and Charith Peris. 2024. Attribute Controlled Fine-tuning for Large Language Models: A Case Study on Detoxification. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 13329–13341, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- Attribute Controlled Fine-tuning for Large Language Models: A Case Study on Detoxification (Meng et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/ingest_wac_2008/2024.findings-emnlp.779.pdf