Vinayak Gupta

2026

PROPER Agents: Proactivity Driven Personalized Agents for Advancing Knowledge Gap Navigation
Kirandeep Kaur | Vinayak Gupta | Aditya Gupta | Chirag Shah
Findings of the Association for Computational Linguistics: ACL 2026

Current approaches to proactive assistance move beyond the ask-and-respond paradigm by anticipating user needs. In practice, they either burden users with clarifying questions or rely on context-based extrapolation, often leading to unnecessary or mistimed interventions. Such systems lack explicit mechanisms to model users’ knowledge gaps, resulting in incomplete or suboptimal task outcomes. To address this, we propose PROPER, a framework that explicitly models user-specific knowledge gaps in a controlled manner. Central to our approach is the notion of dimensions: structured, task-relevant factors that define the considerations required for effective task completion. Given a user query, the DGA (Dimension Generating Agent) identifies explicit dimensions (from the user’s query) and generates a set of candidate implicit dimensions capturing unarticulated aspects of the task. The RGA (Response Generating Agent) integrates both explicit and implicit dimensions selectively to produce personalized, context-aware, and proactively informative responses. We evaluate PROPER across multiple domains using a structured, gap-aware rubric that measures coverage, initiative appropriateness, and intent alignment. PROPER improves on quality scores and win rates across all domains, achieving up to 84% gains in single-turn evaluation and consistent dominance in multiturn interactions. All code for PROPER is available at: https://github.com/i-kiran/ProPer-Agent.

2024

pdf bib abs

Language Models Still Struggle to Zero-shot Reason about Time Series
Mike A Merrill | Mingtian Tan | Vinayak Gupta | Thomas Hartvigsen | Tim Althoff
Findings of the Association for Computational Linguistics: EMNLP 2024

Time series are critical for decision-making in fields like finance and healthcare. Their importance has driven a recent influx of works passing time series into language models, leading to non-trivial forecasting on some datasets. But it remains unknown whether non-trivial forecasting implies that language models can reason about time series. To address this gap, we generate a first-of-its-kind evaluation framework for time series reasoning, including formal tasks and a corresponding dataset of multi-scale time series paired with text captions across ten domains. Using these data, we probe whether language models achieve three forms of reasoning: (1) Etiological Reasoning—given an input time series, can the language model identify the scenario that most likely created it? (2) Question Answering—can a language model answer factual questions about time series? (3) Context-Aided Forecasting–does highly relevant textual context improve a language model’s time series forecasts? We find that otherwise highly-capable language models demonstrate surprisingly limited time series reasoning: they score marginally above random on etiological and question answering tasks (up to 30 percentage points worse than humans) and show modest success in using context to improve forecasting. These weakness showcase that time series reasoning is an impactful, yet deeply underdeveloped direction for language model research. We also make our datasets public to support further research in this direction.

Co-authors

Chirag Shah 1

Mingtian Tan 1

Venues

Findings2

Fix author