Vijaykrishnan Narayanan
2026
When Relations Break: Analyzing Relation Hallucination in Vision-Language Model Under Rotation and Noise
Philip Wootaek Shin | Ajay Narayanan Sridhar | Lakshmi Sivani Devarapalli | Rui Zhang | Jack Sampson | Vijaykrishnan Narayanan
Proceedings of the 4th Workshop on Advances in Language and Vision Research (ALVR)
Philip Wootaek Shin | Ajay Narayanan Sridhar | Lakshmi Sivani Devarapalli | Rui Zhang | Jack Sampson | Vijaykrishnan Narayanan
Proceedings of the 4th Workshop on Advances in Language and Vision Research (ALVR)
Vision–language models (VLMs) achieve strong multimodal performance but remain prone to relation hallucination, which requires accurate reasoning over inter-object interactions. We study the impact of visual perturbations, specifically rotation and noise, and show that even mild distortions significantly degrade relational reasoning across models and datasets. We further evaluate prompt-based augmentation and preprocessing strategies (orientation correction and denoising), finding that while they offer partial improvements, they do not fully resolve hallucinations. Our results reveal a gap between perceptual robustness and relational understanding, highlighting the need for more robust, geometry-aware VLMs.
2022
Token and Head Adaptive Transformers for Efficient Natural Language Processing
Chonghan Lee | Md Fahim Faysal Khan | Rita Brugarolas Brufau | Ke Ding | Vijaykrishnan Narayanan
Proceedings of the 29th International Conference on Computational Linguistics
Chonghan Lee | Md Fahim Faysal Khan | Rita Brugarolas Brufau | Ke Ding | Vijaykrishnan Narayanan
Proceedings of the 29th International Conference on Computational Linguistics
While pre-trained language models like BERT have achieved impressive results on various natural language processing tasks, deploying them on resource-restricted devices is challenging due to their intensive computational cost and memory footprint. Previous approaches mainly focused on training smaller versions of a BERT model with competitive accuracy under limited computational resources. In this paper, we extend Length Adaptive Transformer and propose to design Token and Head Adaptive Transformer, which can compress and accelerate various BERT-based models via simple fine-tuning. We train a transformer with a progressive token and head pruning scheme, eliminating a large number of redundant tokens and attention heads in the later layers. Then, we conduct a multi-objective evolutionary search with the overall number of floating point operations (FLOPs) as its efficiency constraint to find joint token and head pruning strategies that maximize accuracy and efficiency under various computational budgets. Empirical studies show that a large portion of tokens and attention heads could be pruned while achieving superior performance compared to the baseline BERT-based models and Length Adaptive Transformers in various downstream NLP tasks. MobileBERT trained with our joint token and head pruning scheme achieves a GLUE score of 83.0, which is 1.4 higher than Length Adaptive Transformer and 2.9 higher than the original model.