2023
pdf
abs
Are Experts Needed? On Human Evaluation of Counselling Reflection Generation
Zixiu Wu
|
Simone Balloccu
|
Ehud Reiter
|
Rim Helaoui
|
Diego Reforgiato Recupero
|
Daniele Riboni
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Reflection is a crucial counselling skill where the therapist conveys to the client their interpretation of what the client said. Language models have recently been used to generate reflections automatically, but human evaluation is challenging, particularly due to the cost of hiring experts. Laypeople-based evaluation is less expensive and easier to scale, but its quality is unknown for reflections. Therefore, we explore whether laypeople can be an alternative to experts in evaluating a fundamental quality aspect: coherence and context-consistency. We do so by asking a group of laypeople and a group of experts to annotate both synthetic reflections and human reflections from actual therapists. We find that both laypeople and experts are reliable annotators and that they have moderate-to-strong inter-group correlation, which shows that laypeople can be trusted for such evaluations. We also discover that GPT-3 mostly produces coherent and consistent reflections, and we explore changes in evaluation results when the source of synthetic reflections changes to GPT-3 from the less powerful GPT-2.
2022
pdf
abs
Beyond calories: evaluating how tailored communication reduces emotional load in diet-coaching
Simone Balloccu
|
Ehud Reiter
Proceedings of the 2nd Workshop on Human Evaluation of NLP Systems (HumEval)
Dieting is a behaviour change task that is difficult for many people to conduct successfully. This is due to many factors, including stress and cost. Mobile applications offer an alternative to traditional coaching. However, previous work on apps evaluation only focused on dietary outcomes, ignoring users’ emotional state despite its influence on eating habits. In this work, we introduce a novel evaluation of the effects that tailored communication can have on the emotional load of dieting. We implement this by augmenting a traditional diet-app with affective NLG, text-tailoring and persuasive communication techniques. We then run a short 2-weeks experiment and check dietary outcomes, user feedback of produced text and, most importantly, its impact on emotional state, through PANAS questionnaire. Results show that tailored communication significantly improved users’ emotional state, compared to an app-only control group.
pdf
Comparing informativeness of an NLG chatbot vs graphical app in diet-information domain
Simone Balloccu
|
Ehud Reiter
Proceedings of the 15th International Conference on Natural Language Generation
pdf
abs
Towards In-Context Non-Expert Evaluation of Reflection Generation for Counselling Conversations
Zixiu Wu
|
Simone Balloccu
|
Rim Helaoui
|
Diego Reforgiato Recupero
|
Daniele Riboni
Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)
Reflection is an essential counselling strategy, where the therapist listens actively and responds with their own interpretation of the client’s words. Recent work leveraged pre-trained language models (PLMs) to approach reflection generation as a promising tool to aid counsellor training. However, those studies used limited dialogue context for modelling and simplistic error analysis for human evaluation. In this work, we take the first step towards addressing those limitations. First, we fine-tune PLMs on longer dialogue contexts for reflection generation. Then, we collect free-text error descriptions from non-experts about generated reflections, identify common patterns among them, and accordingly establish discrete error categories using thematic analysis. Based on this scheme, we plan for future work a mass non-expert error annotation phase for generated reflections followed by an expert-based validation phase, namely “whether a coherent and consistent response is a good reflection”.
2020
pdf
How are you? Introducing stress-based text tailoring
Simone Balloccu
|
Ehud Reiter
|
Alexandra Johnstone
|
Claire Fyfe
Proceedings of the Workshop on Intelligent Information Processing and Natural Language Generation