Towards Healthy AI: Large Language Models Need Therapists Too
Baihan Lin, Djallel Bouneffouf, Guillermo Cecchi, Kush Varshney
Abstract
Recent advances in large language models (LLMs) have led to the development of powerful chatbots capable of engaging in fluent human-like conversations. However, these chatbots may be harmful, exhibiting manipulation, gaslighting, narcissism, and other toxicity. To work toward safer and more well-adjusted models, we propose a framework that uses psychotherapy to identify and mitigate harmful chatbot behaviors. The framework involves four different artificial intelligence (AI) agents: the Chatbot whose behavior is to be adjusted, a User, a Therapist, and a Critic that can be paired with reinforcement learning-based LLM tuning. We illustrate the framework with a working example of a social conversation involving four instances of ChatGPT, showing that the framework may mitigate the toxicity in conversations between LLM-driven chatbots and people. Although there are still several challenges and directions to be addressed in the future, the proposed framework is a promising approach to improving the alignment between LLMs and human values.- Anthology ID:
- 2024.trustnlp-1.6
- Volume:
- Proceedings of the 4th Workshop on Trustworthy Natural Language Processing (TrustNLP 2024)
- Month:
- June
- Year:
- 2024
- Address:
- Mexico City, Mexico
- Editors:
- Anaelia Ovalle, Kai-Wei Chang, Yang Trista Cao, Ninareh Mehrabi, Jieyu Zhao, Aram Galstyan, Jwala Dhamala, Anoop Kumar, Rahul Gupta
- Venues:
- TrustNLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 61–70
- Language:
- URL:
- https://aclanthology.org/2024.trustnlp-1.6
- DOI:
- 10.18653/v1/2024.trustnlp-1.6
- Cite (ACL):
- Baihan Lin, Djallel Bouneffouf, Guillermo Cecchi, and Kush Varshney. 2024. Towards Healthy AI: Large Language Models Need Therapists Too. In Proceedings of the 4th Workshop on Trustworthy Natural Language Processing (TrustNLP 2024), pages 61–70, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- Towards Healthy AI: Large Language Models Need Therapists Too (Lin et al., TrustNLP-WS 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.trustnlp-1.6.pdf