Yogesh Kulkarni


2025

pdf bib
VideoPASTA: 7K Preference Pairs That Matter for Video-LLM Alignment
Yogesh Kulkarni | Pooyan Fazli
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Video-language models (Video-LLMs) excel at understanding video content but struggle with spatial relationships, temporal ordering, and cross-frame continuity. To address these limitations, we introduce VideoPASTA (Preference Alignment with Spatio-Temporal-Cross Frame Adversaries), a framework that enhances Video-LLMs through targeted preference optimization. VideoPASTA trains models to distinguish accurate video representations from carefully crafted adversarial examples that deliberately violate spatial, temporal, or cross-frame relationships. With only 7,020 preference pairs and Direct Preference Optimization, VideoPASTA enables models to learn robust representations that capture fine-grained spatial details and long-range temporal dynamics. Experiments demonstrate that VideoPASTA is model agnostic and significantly improves performance, for example, achieving gains of up to + 3.8 percentage points on LongVideoBench, +4.1 on VideoMME, and +4.0 on MVBench, when applied to various state-of-the-art Video-LLMs. These results demonstrate that targeted alignment, rather than massive pretraining or architectural modifications, effectively addresses core video-language challenges. Notably, VideoPASTA achieves these improvements without any human annotation or captioning, relying solely on 32-frame sampling. This efficiency makes our approach a scalable plug-and-play solution that seamlessly integrates with existing models while preserving their original capabilities.

pdf bib
NeuroReset : LLM Unlearning via Dual Phase Mixed Methodology
Dhwani Bhavankar | Het Sevalia | Shubh Agarwal | Yogesh Kulkarni | Rahee Walambe | Ketan Kotecha
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper presents the method for the unlearning of sensitive information from large language models as applied in the SemEval 2025 Task 4 challenge. The unlearning pipeline consists of two phases. In phase I, the model is instructed to forget specific datasets, and in phase II, the model is stabilized using a retention dataset. Unlearning with these methods secured a final score of 0.420 with the 2nd honorary mention in the 7B parameter challenge and a score of 0.36 in the 13th position for the 1B parameter challenge. The paper presents a background study, a brief literature review, and a gap analysis, as well as the methodology employed in our work titled NeuroReset. The training methodology and evaluation metrics are also presented, and the trade-offs between unlearning efficiency and model performance are discussed. The contributions of the paper are systematic unlearning, a comparative analysis of unlearning methods, and an empirical analysis of model performance post-unlearning.