Darshna Parmar


2025

pdf bib
Measuring Prosodic Richness in LLM-Generated Responses for Conversational Recommendation
Darshna Parmar | Pramit Mazumdar
Proceedings of the Workshop on Beyond English: Natural Language Processing for all Languages in an Era of Large Language Models

This paper presents a novel framework for stylistic evaluation in conversational recommendation systems (CRS), focusing on the prosodic and expressive qualities of generated responses. While prior work has predominantly emphasized semantic relevance and recommendation accuracy, the stylistic fidelity of model outputs remains underexplored. We introduce the prosodic richness score (PRS), a composite metric that quantifies expressive variation through structural pauses, emphatic lexical usage, and rhythmic variability. Using PRS, we conduct both sentence-level and turn-level analyses across six contemporary large language models (LLMs) on two benchmark CRS datasets: ReDial, representing goal-oriented dialogue, and INSPIRED, which incorporates stylized social interaction. Empirical results reveal statistically significant differences (p < 0.01) in PRS between human and model-generated responses, highlighting the limitations of current LLMs in reproducing natural prosodic variation. Our findings advocate for broader evaluation of stylistic attributes in dialogue generation, offering a scalable approach to enhance expressive language modeling in CRS.

pdf bib
Emotionally Aware or Tone-Deaf? Evaluating Emotional Alignment in LLM-Based Conversational Recommendation Systems
Darshna Parmar | Pramit Mazumdar
Proceedings of the 9th Widening NLP Workshop

Recent advances in Large Language Models (LLMs) have enhanced the fluency and coherence of Conversational Recommendation Systems (CRSs), yet emotional intelligence remains a critical gap. In this study, we systematically evaluate the emotional behavior of six state-of-the-art LLMs in CRS settings using the ReDial and INSPIRED datasets. We propose an emotion-aware evaluation framework incorporating metrics such as Emotion Alignment, Emotion Flatness, and per-emotion F1-scores. Our analysis shows that most models frequently default to emotionally flat or mismatched responses, often misaligning with user affect (e.g., joy misread as neutral). We further examine patterns of emotional misalignment and their impact on user-centric qualities such as personalization, justification, and satisfaction. Through qualitative analysis, we demonstrate that emotionally aligned responses enhance user experience, while misalignments lead to loss of trust and relevance. This work highlights the need for emotion-aware design in CRS and provides actionable insights for improving affective sensitivity in LLM-generated recommendations.