Bhanu Harsha Yanamadala


2026

Counter Narrative (CN) generation via Large Language Models (LLMs) offers a scalable approach to combating hate speech by producing targeted responses that challenge harmful content. However, existing methods typically require costly post-training or fine-tuning to improve narrative diversity and quality. We introduce a fine-tuning-free prompt optimization technique that enhances Counter Narrative effectiveness without additional model training, making it both resource-efficient and readily deployable. We conduct extensive evaluation on hate speech datasets spanning English and Tamil, employing both reference-based metrics and rubric-based LLM-as-a-judge scoring to capture multiple dimensions of narrative quality. Experiments across multiple LLMs demonstrate that our approach consistently outperforms vanilla prompting baselines, exhibits strong transferability across models, and adapts seamlessly to new evaluation metrics—requiring no architectural or procedural changes. Our findings suggest that carefully optimized prompting strategies can match or exceed the performance of more resource-intensive approaches, offering a practical path toward scalable hate speech intervention.