Yogesh Kulkarni
2025
VideoPASTA: 7K Preference Pairs That Matter for Video-LLM Alignment
Yogesh Kulkarni
|
Pooyan Fazli
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Video-language models (Video-LLMs) excel at understanding video content but struggle with spatial relationships, temporal ordering, and cross-frame continuity. To address these limitations, we introduce VideoPASTA (Preference Alignment with Spatio-Temporal-Cross Frame Adversaries), a framework that enhances Video-LLMs through targeted preference optimization. VideoPASTA trains models to distinguish accurate video representations from carefully crafted adversarial examples that deliberately violate spatial, temporal, or cross-frame relationships. With only 7,020 preference pairs and Direct Preference Optimization, VideoPASTA enables models to learn robust representations that capture fine-grained spatial details and long-range temporal dynamics. Experiments demonstrate that VideoPASTA is model agnostic and significantly improves performance, for example, achieving gains of up to + 3.8 percentage points on LongVideoBench, +4.1 on VideoMME, and +4.0 on MVBench, when applied to various state-of-the-art Video-LLMs. These results demonstrate that targeted alignment, rather than massive pretraining or architectural modifications, effectively addresses core video-language challenges. Notably, VideoPASTA achieves these improvements without any human annotation or captioning, relying solely on 32-frame sampling. This efficiency makes our approach a scalable plug-and-play solution that seamlessly integrates with existing models while preserving their original capabilities.
NeuroReset : LLM Unlearning via Dual Phase Mixed Methodology
Dhwani Bhavankar
|
Het Sevalia
|
Shubh Agarwal
|
Yogesh Kulkarni
|
Rahee Walambe
|
Ketan Kotecha
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
This paper presents the method for the unlearning of sensitive information from large language models as applied in the SemEval 2025 Task 4 challenge. The unlearning pipeline consists of two phases. In phase I, the model is instructed to forget specific datasets, and in phase II, the model is stabilized using a retention dataset. Unlearning with these methods secured a final score of 0.420 with the 2nd honorary mention in the 7B parameter challenge and a score of 0.36 in the 13th position for the 1B parameter challenge. The paper presents a background study, a brief literature review, and a gap analysis, as well as the methodology employed in our work titled NeuroReset. The training methodology and evaluation metrics are also presented, and the trade-offs between unlearning efficiency and model performance are discussed. The contributions of the paper are systematic unlearning, a comparative analysis of unlearning methods, and an empirical analysis of model performance post-unlearning.
Search
Fix author
Co-authors
- Shubh Agarwal 1
- Dhwani Bhavankar 1
- Pooyan Fazli 1
- Ketan Kotecha 1
- Het Sevalia 1
- show all...