Tarun Kumar


2026

Hallucinated object mentions remain a persistent failure mode of vision–language models (VLMs) across generation tasks such as image captioning and visual question answering: outputs may be fluent yet include entities not supported by visual evidence. Existing mitigation approaches often reduce hallucinations at the cost of degraded generation quality or require expensive retraining and task-specific supervision. We introduce CEBC, a lightweight, training-free framework for low-hallucination vision–language generation based on conformal evidence-bounded minimal editing. CEBC first produces a strong base output (via greedy decoding or best-of-K sampling), then applies an evidence-bounded editing step that minimally revises or suppresses unsupported object mentions using constraints derived from an external visual detector. Crucially, the evidence threshold is conformally calibrated on a small held-out set via quantiles of detector confidence scores, enabling explicit and controllable hallucination risk at test time.To balance factuality and informativeness, we further introduce a risk-first, quality-aware selection rule that prioritizes evidence-consistent generations while regularizing unnecessary length or lexical drift. Extensive experiments on MS-COCO and GQA for image captioning, and POPE for VQA evaluation across multiple VLMs demonstrate that CEBC consistently reduces hallucination rates(CHAIR_S, CHAIR_I, POPE) while maintaining or improving standard generation quality metrics (CIDEr, BLEU, CLIPScore). CEBC establishes a stronger factuality–quality Pareto frontier without any additional model training or access to paired supervision beyond an off-the-shelf detector.

2025

The rapid expansion of Large Language Models (LLMs) and natural language processing datasets has made exhaustive benchmark evaluations computationally prohibitive. Inspired by high-stakes competitions like the International Mathematical Olympiad-where a few well-chosen problems suffice to differentiate top performers—we present SubLIME, which reduces evaluation costs by 80% to 99% while preserving ranking fidelity. It trains a Rank Correlation Prediction (RCP) model that combines limited performance data from only 5-20 anchor LLMs with dataset intrinsic metrics - Difficulty, Quality, and Distributional Dispersion-to predict how closely a candidate subset reflects full-benchmark rankings. Guided by these predictions, SubLIME selects a “winning” subset (1-20% of full set data) for evaluating new LLMs, preserving global rankings significant better than other data-efficient methods across ten diverse benchmarks.

2020

Recent work on automatic sequential metaphor detection has involved recurrent neural networks initialized with different pre-trained word embeddings and which are sometimes combined with hand engineered features. To capture lexical and orthographic information automatically, in this paper we propose to add character based word representation. Also, to contrast the difference between literal and contextual meaning, we utilize a similarity network. We explore these components via two different architectures - a BiLSTM model and a Transformer Encoder model similar to BERT to perform metaphor identification. We participate in the Second Shared Task on Metaphor Detection on both the VUA and TOFEL datasets with the above models. The experimental results demonstrate the effectiveness of our method as it outperforms all the systems which participated in the previous shared task.