PERSONACHATGEN: Generating Personalized Dialogues using GPT-3

Young-Jun Lee, Chae-Gyun Lim, Yunsu Choi, Ji-Hui Lm, Ho-Jin Choi


Abstract
Recently, many prior works have made their own agents generate more personalized and engaging responses using personachat. However, since this dataset is frozen in 2018, the dialogue agents trained on this dataset would not know how to interact with a human who loves “Wandavision.” One way to alleviate this problem is to create a large-scale dataset. In this work, we introduce the pipeline of creating personachatgen, which is comprised of three main components: Creating (1) profilegen, (2) Persona Set, and (3) personachatgen. To encourage GPT-3’s generation ability, we also defined a taxonomy of hierarchical persona category derived from social profiling taxonomy. To create the speaker consistent persona set, we propose a simple contradiction-based iterative sentence replacement algorithm, named CoNL. Moreover, to prevent GPT-3 generating harmful content, we presented two filtering pipelines, one each for profilegen and personachatgen. Through analyzing of personachatgen, we showed that GPT-3 can generate personalized dialogue containing diverse persona. Furthermore, we revealed a state-of-the-art Blender 90M trained on our dataset that leads to higher performance.
Anthology ID:
2022.ccgpk-1.4
Volume:
Proceedings of the 1st Workshop on Customized Chat Grounding Persona and Knowledge
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Heuiseok Lim, Seungryong Kim, Yeonsoo Lee, Steve Lin, Paul Hongsuck Seo, Yumin Suh, Yoonna Jang, Jungwoo Lim, Yuna Hur, Suhyune Son
Venue:
CCGPK
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
29–48
Language:
URL:
https://aclanthology.org/2022.ccgpk-1.4
DOI:
Bibkey:
Cite (ACL):
Young-Jun Lee, Chae-Gyun Lim, Yunsu Choi, Ji-Hui Lm, and Ho-Jin Choi. 2022. PERSONACHATGEN: Generating Personalized Dialogues using GPT-3. In Proceedings of the 1st Workshop on Customized Chat Grounding Persona and Knowledge, pages 29–48, Gyeongju, Republic of Korea. Association for Computational Linguistics.
Cite (Informal):
PERSONACHATGEN: Generating Personalized Dialogues using GPT-3 (Lee et al., CCGPK 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2022.ccgpk-1.4.pdf
Code
 passing2961/personachatgen