Towards Healthy AI: Large Language Models Need Therapists Too

Baihan Lin, Djallel Bouneffouf, Guillermo Cecchi, Kush Varshney


Abstract
Recent advances in large language models (LLMs) have led to the development of powerful chatbots capable of engaging in fluent human-like conversations. However, these chatbots may be harmful, exhibiting manipulation, gaslighting, narcissism, and other toxicity. To work toward safer and more well-adjusted models, we propose a framework that uses psychotherapy to identify and mitigate harmful chatbot behaviors. The framework involves four different artificial intelligence (AI) agents: the Chatbot whose behavior is to be adjusted, a User, a Therapist, and a Critic that can be paired with reinforcement learning-based LLM tuning. We illustrate the framework with a working example of a social conversation involving four instances of ChatGPT, showing that the framework may mitigate the toxicity in conversations between LLM-driven chatbots and people. Although there are still several challenges and directions to be addressed in the future, the proposed framework is a promising approach to improving the alignment between LLMs and human values.
Anthology ID:
2024.trustnlp-1.6
Volume:
Proceedings of the 4th Workshop on Trustworthy Natural Language Processing (TrustNLP 2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kai-Wei Chang, Anaelia Ovalle, Jieyu Zhao, Yang Trista Cao, Ninareh Mehrabi, Aram Galstyan, Jwala Dhamala, Anoop Kumar, Rahul Gupta
Venues:
TrustNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
61–70
Language:
URL:
https://aclanthology.org/2024.trustnlp-1.6
DOI:
Bibkey:
Cite (ACL):
Baihan Lin, Djallel Bouneffouf, Guillermo Cecchi, and Kush Varshney. 2024. Towards Healthy AI: Large Language Models Need Therapists Too. In Proceedings of the 4th Workshop on Trustworthy Natural Language Processing (TrustNLP 2024), pages 61–70, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Towards Healthy AI: Large Language Models Need Therapists Too (Lin et al., TrustNLP-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jeptaln-2024-ingestion/2024.trustnlp-1.6.pdf