Navid Madani
2025
A Recipe For Building a Compliant Real Estate Chatbot
Navid Madani
|
Anusha Bagalkotkar
|
Supriya Anand
|
Gabriel Arnson
|
Rohini K. Srihari
|
Kenneth Joseph
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
In recent years, there has been significant effort to align large language models with human preferences. This work focuses on developing a chatbot specialized in the real estate domain, with an emphasis on incorporating compliant behavior to ensure it can be used without perpetuating discriminatory practices like steering and redlining, which have historically plagued the real estate industry in the United States. Building on prior work, we present a method for generating a synthetic general instruction-following dataset, along with safety data. Through extensive evaluations and benchmarks, we fine-tuned a llama-3-8B-instruct model and demonstrated that we can enhance it’s performance significantly to match huge closed-source models like GPT-4o while making it safer and more compliant. We open-source the model, data and code to support further development and research in the community
ESC-Judge: A Framework for Comparing Emotional Support Conversational Agents
Navid Madani
|
Rohini Srihari
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large Language Models (LLMs) increasingly power mental-health chatbots, yet the field still lacks a scalable, theory-grounded way to decide which model is more effective to deploy. We present ESC-Judge, the first end-to-end evaluation framework that (i) grounds head-to-head comparison of Emotional-Support LLMs (ES-LLMs) in an established psychological theory—Clara Hill’s Exploration–Insight–Action (E-I-A) counselling model—thereby delivering a structured, interpretable lens on performance, and (ii) fully automates the pipeline at scale. ESC-Judge proceeds in three stages: (1) it synthesizes realistic help-seeker roles by sampling empirically salient attributes (stressors, personality, life history); (2) it has two candidate ES-Agents conduct separate sessions with the same role, isolating model-specific strategies; and (3) it asks a specialised judge LLM to issue pairwise preferences across rubric-anchored skills that exhaustively cover the E-I-A spectrum. In our empirical study, ESC-Judge matches PhD-level annotators in 85% of Exploration, 83% of Insight, and 86% of Action decisions, demonstrating human-level reliability at a fraction of the cost. We release all code, prompts, synthetic roles, transcripts, and judgment scripts to catalyze transparent progress in emotionally supportive AI
Steering Conversational Large Language Models for Long Emotional Support Conversations
Navid Madani
|
Rohini Srihari
Proceedings of the Third Workshop on Social Influence in Conversations (SICon 2025)
In this study, we address the challenge of consistently following emotional support strategies in long conversations by large language models (LLMs). We introduce the Strategy-Relevant Attention (SRA) metric, a model-agnostic measure designed to evaluate the effectiveness of LLMs in adhering to strategic prompts in emotional support contexts. By analyzing conversations within the Emotional Support Conversations dataset (ESConv) using LLaMA models, we demonstrate that SRA is significantly correlated with a model’s ability to sustain the outlined strategy throughout the interactions. Our findings reveal that the application of SRA-informed prompts leads to enhanced strategic adherence, resulting in conversations that more reliably exhibit the desired emotional support strategies over longer conversations. Furthermore, we contribute a comprehensive, multi-branch synthetic conversation dataset for ESConv, featuring a variety of strategy continuations informed by our optimized prompting method. The code and data are publicly available on our Github.