Aditya Mate


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
How to Fine-Tune Safely on a Budget: Model Adaptation Using Minimal Resources
Anh C. Pham | Mihir Thalanki | Michael Sun | Aditya Chaloo | Ankita Gupta | Tian Xia | Aditya Mate | Ehi Nosakhare | Soundararajan Srinivasan
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Supervised fine-tuning (SFT) on benign data can paradoxically erode a language model’s safety alignment, a phenomenon known as catastrophic forgetting of safety behaviors. Although prior work shows that randomly adding safety examples can reduce harmful output, the principles that make certain examples more effective than others remain poorly understood. This paper investigates the hypothesis that the effectiveness of a safety example is governed by two key factors: its instruction-response behavior (e.g., refusal vs. explanation) and its semantic diversity across harm categories. We systematically evaluate sampling strategies based on these axes and find that structured, diversity-aware sampling significantly improves model safety. Our method reduces harmfulness by up to 41% while adding only 0.05% more data to the fine-tuning set.