Vasu Sharma


2025

pdf bib
Advancing Uto-Aztecan Language Technologies: A Case Study on the Endangered Comanche Language
Jesus Alvarez C | Daua Karajeanes | Ashley Prado | John Ruttan | Ivory Yang | Sean O’brien | Vasu Sharma | Kevin Zhu
Proceedings of the Fifth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)

The digital exclusion of endangered languages remains a critical challenge in NLP, limiting both linguistic research and revitalization efforts. This study introduces the first computational investigation of Comanche, an Uto-Aztecan language on the verge of extinction, demonstrating how minimal-cost, community-informed NLP interventions can support language preservation. We present a manually curated dataset of 412 phrases, a synthetic data generation pipeline, and an empirical evaluation of GPT-4o and GPT-4o-mini for language identification. Our experiments reveal that while LLMs struggle with Comanche in zero-shot settings, few-shot prompting significantly improves performance, achieving near-perfect accuracy with just five examples. Our findings highlight the potential of targeted NLP methodologies in low-resource contexts and emphasize that visibility is the first step toward inclusion. By establishing a foundation for Comanche in NLP, we advocate for computational approaches that prioritize accessibility, cultural sensitivity, and community engagement.

pdf bib
DPO Kernels: A Semantically-Aware, Kernel-Enhanced, and Divergence-Rich Paradigm for Direct Preference Optimization
Amitava Das | Suranjana Trivedy | Danush Khanna | Yaswanth Narsupalli | Basab Ghosh | Rajarshi Roy | Gurpreet Singh | Vinija Jain | Vasu Sharma | Aishwarya Naresh Reganti | Aman Chadha
Findings of the Association for Computational Linguistics: ACL 2025

The rapid advancement of large language models (LLMs) has revolutionized numerous applications, but presents significant challenges in aligning these models with diverse human values, ethical standards, and specific user preferences. Direct Preference Optimization (DPO) has become a cornerstone for preference alignment but is constrained by reliance on fixed divergence measures and limited feature transformations. We introduce DPO-Kernels, an innovative enhancement of DPO that integrates kernel methods to overcome these challenges through four key contributions: (i) Kernelized Representations: These representations enhance divergence measures by using polynomial, RBF, Mahalanobis, and spectral kernels for richer feature transformations. Additionally, we introduce a hybrid loss that combines embedding-based loss with probability-based loss; (ii) Divergence Alternatives: Beyond Kullback–Leibler (KL), we incorporate Jensen-Shannon, Hellinger, Rényi, Bhattacharyya, Wasserstein, and other f-divergences to boost stability and robustness; (iii) Data-Driven Selection: Choosing the optimal kernel-divergence pair among 28 combinations (4 kernels × 7 divergences) is challenging. We introduce automatic metrics that analyze the data to select the best kernel-divergence pair, eliminating the need for manual tuning; (iv) Hierarchical Mixture of Kernels (HMK): Combining local and global kernels for precise and large-scale semantic modeling. This approach automatically selects the optimal kernel mixture during training, enhancing modeling flexibility. DPO-Kernels achieve state-of-the-art generalization in factuality, safety, reasoning, and instruction following across 12 datasets. While alignment risks overfitting, Heavy-Tailed Self-Regularization (HT-SR) theory confirms that DPO-Kernels ensure robust generalization in LLMs. Comprehensive resources are available to facilitate further research and application of DPO-Kernels.

pdf bib
YinYang-Align: A new Benchmark for Competing Objectives and Introducing Multi-Objective Preference based Text-to-Image Alignment
Amitava Das | Yaswanth Narsupalli | Gurpreet Singh | Vinija Jain | Vasu Sharma | Suranjana Trivedy | Aman Chadha | Amit Sheth
Findings of the Association for Computational Linguistics: ACL 2025

Precise alignment in Text-to-Image (T2I) systems is crucial for generating visuals that reflect user intent while adhering to ethical and policy standards. Recent controversies, such as the Google Gemini-generated Pope image backlash, highlight the urgent need for robust alignment mechanisms. Building on alignment successes in Large Language Models (LLMs), this paper introduces YinYangAlign, a benchmarking framework designed to evaluate and optimize T2I systems across six inherently contradictory objectives. These objectives highlight core trade-offs, such as balancing faithfulness to prompts with artistic freedom and maintaining cultural sensitivity without compromising creativity. Alongside this benchmark, we propose the Contradictory Alignment Optimization (CAO) framework, an extension of Direct Preference Optimization (DPO), which employs multi-objective optimization techniques to address these competing goals. By leveraging per-axiom loss functions, synergy-driven global preferences, and innovative tools like the Synergy Jacobian, CAO achieves superior alignment across all objectives. Experimental results demonstrate significant improvements in fidelity, diversity, and ethical adherence, setting new benchmarks for the field. This work provides a scalable, effective approach to resolving alignment challenges in T2I systems while offering insights into broader AI alignment paradigms.

pdf bib
Rosetta-PL: Propositional Logic as a Benchmark for Large Language Model Reasoning
Shaun Lee Baek | Shaun Esua-Mensah | Cyrus Tsui | Sejan Vigneswaralingam | Abdullah Alali | Michael Lu | Vasu Sharma | Kevin Zhu
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)

Large Language Models (LLMs) are primarily trained on high-resource natural languages, limiting their effectiveness in low-resource settings and in tasks requiring deep logical reasoning. This research introduces Rosetta-PL, a benchmark designed to evaluate LLMs’ logical reasoning and generalization capabilities in a controlled environment. We construct Rosetta-PL by translating a dataset of logical propositions from Lean into a custom logical language, which is then used to fine-tune an LLM (e.g., GPT-4o). Our experiments analyze the impact of the size of the dataset and the translation methodology on the performance of the model. Our results indicate that preserving logical relationships in the translation process significantly boosts precision, with accuracy plateauing beyond roughly 20,000 training samples. These insights provide valuable guidelines for optimizing LLM training in formal reasoning tasks and improving performance in various low-resource language applications.

pdf bib
FrontierScience Bench: Evaluating AI Research Capabilities in LLMs
Matthew Li | Santiago Torres-Garcia | Shayan Halder | Phani Kuppa | Sean O’Brien | Vasu Sharma | Kevin Zhu | Sunishchal Dev
Proceedings of the 1st Workshop for Research on Agent Language Models (REALM 2025)

Large language models (LLMs) have shown remarkable capabilities across various tasks, yet their potential to reason about and construct scientific methodologies remains under explored. This work introduces a novel benchmark evaluating LLMs’ capacity to predict methodological details in AI research papers. We construct a dataset of 88 papers with redacted methodology sections and zero-shot prompt several state-of-the-art LLMs to generate methodology predictions. Our evaluation framework then employs a LLM-as-judge system with multiple LLM judges, majority voting, and self-omission techniques to minimize biases. We validate our LLM judge scores against human judgments. We then briefly analyze the judging results of our zero-shot prediction pipeline, suggesting that even state-of-the-art LLMs struggle with the task of methodology generation without more advanced techniques. This benchmark lays the groundwork for future research into evaluating LLMs’ potential for aiding in AI research.

pdf bib
DecepBench: Benchmarking Multimodal Deception Detection
Ethan Braverman | Vittesh Maganti | Nysa Lalye | Akhil Ganti | Michael Lu | Kevin Zhu | Vasu Sharma | Sean O’Brien
Proceedings of the Third Workshop on Social Influence in Conversations (SICon 2025)

Deception detection is crucial in domains such as security, forensics, and legal proceedings, as well as to ensure the reliability of AI systems. However, current approaches are limited by the lack of generalizable and interpretable benchmarks built on large and diverse datasets. To address this gap, we introduce DecepBench, a comprehensive and robust benchmark for multimodal deception detection. DecepBench includes an enhanced version of the DOLOS dataset, the largest game-show deception dataset (1,700 labeled video clips with audio). We augment each video clip with transcripts, introducing a third modality (text) and incorporating deception-related features identified in psychological research. We employ explainable methods to evaluate the relevance of key deception cues, providing insights into model limitations and guiding future improvements. Our enhancements to DOLOS, combined with these interpretable analyses, yield improved performance and a deeper understanding of multimodal deception detection.

2022

pdf bib
Tweet Based Reach Aware Temporal Attention Network for NFT Valuation
Ramit Sawhney | Megh Thakkar | Ritesh Soun | Atula Neerkaje | Vasu Sharma | Dipanwita Guhathakurta | Sudheer Chava
Findings of the Association for Computational Linguistics: EMNLP 2022

Non-Fungible Tokens (NFTs) are a relatively unexplored class of assets. Designing strategies to forecast NFT trends is an intricate task due to its extremely volatile nature. The market is largely driven by public sentiment and “hype”, which in turn has a high correlation with conversations taking place on social media platforms like Twitter. Prior work done for modelling stock market data does not take into account the extent of impact certain highly influential tweets and their authors can have on the market. Building on these limitations and the nature of the NFT market, we propose a novel reach-aware temporal learning approach to make predictions for forecasting future trends in the NFT market. We perform experiments on a new dataset consisting of over 1.3 million tweets and 180 thousand NFT transactions spanning over 15 NFT collections curated by us. Our model (TA-NFT) outperforms other state-of-the-art methods by an average of 36%. Through extensive quantitative and ablative analysis, we demonstrate the ability of our approach as a practical method for predicting NFT trends.

2018

pdf bib
BioAMA: Towards an End to End BioMedical Question Answering System
Vasu Sharma | Nitish Kulkarni | Srividya Pranavi | Gabriel Bayomi | Eric Nyberg | Teruko Mitamura
Proceedings of the BioNLP 2018 workshop

In this paper, we present a novel Biomedical Question Answering system, BioAMA: “Biomedical Ask Me Anything” on task 5b of the annual BioASQ challenge. In this work, we focus on a wide variety of question types including factoid, list based, summary and yes/no type questions that generate both exact and well-formed ‘ideal’ answers. For summary-type questions, we combine effective IR-based techniques for retrieval and diversification of relevant snippets for a question to create an end-to-end system which achieves a ROUGE-2 score of 0.72 and a ROUGE-SU4 score of 0.71 on ideal answer questions (7% improvement over the previous best model). Additionally, we propose a novel NLI-based framework to answer the yes/no questions. To train the NLI model, we also devise a transfer-learning technique by cross-domain projection of word embeddings. Finally, we present a two-stage approach to address the factoid and list type questions by first generating a candidate set using NER taggers and ranking them using both supervised or unsupervised techniques.

pdf bib
Cyclegen: Cyclic consistency based product review generator from attributes
Vasu Sharma | Harsh Sharma | Ankita Bishnu | Labhesh Patel
Proceedings of the 11th International Conference on Natural Language Generation

In this paper we present an automatic review generator system which can generate personalized reviews based on the user identity, product identity and designated rating the user wishes to allot to the review. We combine this with a sentiment analysis system which performs the complimentary task of assigning ratings to reviews based purely on the textual content of the review. We introduce an additional loss term to ensure cyclic consistency of the sentiment rating of the generated review with the conditioning rating used to generate the review. The introduction of this new loss term constraints the generation space while forcing it to generate reviews adhering better to the requested rating. The use of ‘soft’ generation and cyclic consistency allows us to train our model in an end to end fashion. We demonstrate the working of our model on product reviews from Amazon dataset.

2017

pdf bib
Segmentation Guided Attention Networks for Visual Question Answering
Vasu Sharma | Ankita Bishnu | Labhesh Patel
Proceedings of ACL 2017, Student Research Workshop

2016

pdf bib
Automatic tagging and retrieval of E-Commerce products based on visual features
Vasu Sharma | Harish Karnick
Proceedings of the NAACL Student Research Workshop

2015

pdf bib
Analyzing Newspaper Crime Reports for Identification of Safe Transit Paths
Vasu Sharma | Rajat Kulshreshtha | Puneet Singh | Nishant Agrawal | Akshay Kumar
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop