Inass Rachidi
2025
Design, Generation and Evaluation of a Synthetic Dialogue Dataset for Contextually Aware Chatbots in Art Museums
Inass Rachidi
|
Anas Ezzakri
|
Jaime Bellver-Soler
|
Luis Fernando D’Haro
Proceedings of the 15th International Workshop on Spoken Dialogue Systems Technology
This paper presents the design, synthetic generation, and automated evaluation of ArtGenEval-GPT++, an advanced dataset for training and fine-tuning conversational agents with artificial awareness capabilities targeting to the art domain. Building on the foundation of a previously released dataset (ArtGenEval-GPT), the new version introduces enhancements for greater personalization (e.g., gender, ethnicity, age, and knowledge) while addressing prior limitations, including low-quality dialogues and hallucinations. The dataset comprises approximately 12,500 dyadic, multi-turn dialogues generated using state-of-the-art large language models (LLMs). These dialogues span diverse museum scenarios, incorporating varied visitor profiles, emotional states, interruptions, and chatbot behaviors. Objective evaluations confirm the dataset’s quality and contextual coherence. Ethical considerations, including biases and hallucinations, are analyzed, with proposed directions for improving the dataset utility. This work contributes to the development of personalized, context-aware conversational agents capable of navigating complex, real-world environments, such as museums, to enhance visitor engagement and satisfaction.