Aswini Padhi


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2024

pdf bib
Intent-conditioned and Non-toxic Counterspeech Generation using Multi-Task Instruction Tuning with RLAIF
Amey Hengle | Aswini Padhi | Sahajpreet Singh | Anil Bandhakavi | Md Shad Akhtar | Tanmoy Chakraborty
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Counterspeech, defined as a response to mitigate online hate speech, is increasingly used as a non-censorial solution. The effectiveness of addressing hate speech involves dispelling the stereotypes, prejudices, and biases often subtly implied in brief, single-sentence statements or abuses. These expressions challenge language models, especially in seq2seq tasks, as model performance typically excels with longer contexts. Our study introduces CoARL, a novel framework enhancing counterspeech generation by modeling the pragmatic implications underlying social biases in hateful statements. The first two phases of CoARL involve sequential multi-instruction tuning, teaching the model to understand intents, reactions, and harms of offensive statements, and then learning task-specific low-rank adapter weights for generating intent-conditioned counterspeech. The final phase uses reinforcement learning to fine-tune outputs for effectiveness and nontoxicity. CoARL outperforms existing benchmarks in intent-conditioned counterspeech generation, showing an average improvement of ∼3 points in intent-conformity and ∼4 points in argument-quality metrics. Extensive human evaluation supports CoARL’s efficacy in generating superior and more context-appropriate responses compared to existing systems, including prominent LLMs like ChatGPT.