This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
MasahiroMizukami
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
Facilitating self-disclosure without causing discomfort remains a difficult task—especially for AI systems. In real-world applications such as career counseling, wellbeing support, and onboarding interviews, eliciting personal information like concerns, goals, and personality traits is essential. However, asking such questions directly often leads to discomfort and disengagement. We address this issue with RaPSIL (Rapport-aware Preference-guided Self-disclosure Interview Learner), a two-stage LLM-based system that fosters natural, engaging conversations to promote self-disclosure. In the first stage, RaPSIL selectively imitates interviewer utterances that have been evaluated by LLMs for both strategic effectiveness and social sensitivity. It leverages LLMs as multi-perspective judges in this selection process. In the second stage, it conducts self-play simulations, using the Reflexion framework to analyze failures and expand a database with both successful and problematic utterances. This dual learning process allows RaPSIL to go beyond simple imitation, improving its ability to handle sensitive topics naturally by learning from both successful and failed utterances. In a comprehensive evaluation with real users, RaPSIL outperformed baselines in enjoyability, warmth, and willingness to re-engage, while also capturing self-descriptions more accurately. Notably, its impression scores remained stable even during prolonged interactions, demonstrating its ability to balance rapport building with effective information elicitation. These results show that RaPSIL enables socially aware AI interviewers capable of eliciting sensitive personal information while maintaining user trust and comfort—an essential capability for real-world dialogue systems.
Long-term chatbots are expected to develop relationships with users. The major trend in this field’s recent long-term chatbot studies is to train systems with virtual long-term chat data called Multi-Session Chat (MSC), which collects text chat from multiple sessions of crowd workers playing the roles of speakers with defined personas. However, no investigation has attempted to determine whether such virtual long-term chat can successfully simulate relationship-building between speakers. To clarify the difference between an actual long-term intimacy process and an MSC intimacy process, this study collects real long-term chat and MSC in Japanese and compares them in terms of speech form and dialogue acts. The results of analyzing these factors suggest that MSC have an unnatural tendency to behave as if they have a close relationship with non-polite speech levels compared to actual long-term chats, but also as if they have a shallow relationship with more questions than real long-term chats.
This paper proposes a taxonomy of errors in chat-oriented dialogue systems. Previously, two taxonomies were proposed; one is theory-driven and the other data-driven. The former suffers from the fact that dialogue theories for human conversation are often not appropriate for categorizing errors made by chat-oriented dialogue systems. The latter has limitations in that it can only cope with errors of systems for which we have data. This paper integrates these two taxonomies to create a comprehensive taxonomy of errors in chat-oriented dialogue systems. We found that, with our integrated taxonomy, errors can be reliably annotated with a higher Fleiss’ kappa compared with the previously proposed taxonomies.
Having consistent personalities is important for chatbots if we want them to be believable. Typically, many question-answer pairs are prepared by hand for achieving consistent responses; however, the creation of such pairs is costly. In this study, our goal is to collect a large number of question-answer pairs for a particular character by using role play-based question-answering in which multiple users play the roles of certain characters and respond to questions by online users. Focusing on two famous characters, we conducted a large-scale experiment to collect question-answer pairs by using real users. We evaluated the effectiveness of role play-based question-answering and found that, by using our proposed method, the collected pairs lead to good-quality chatbots that exhibit consistent personalities.