Bhavuk Singhal
2026
TrendPulse: A Simple yet Efficient Framework for Capturing Viral E-Commerce Spikes via LLM-Driven Contextualization
Arin Jain | Devashish Gupta | Bhavuk Singhal | Divay Jindal | Vinit Rongata | Ravindra Kumar Yadav
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Arin Jain | Devashish Gupta | Bhavuk Singhal | Divay Jindal | Vinit Rongata | Ravindra Kumar Yadav
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Anticipating and capturing transient demand spikes is a critical challenge for e-commerce platforms, as reactive discovery mechanisms often fail to surface relevant products during rapid cultural or seasonal shifts. We propose TrendPulse, a three-stage framework that identifies regional search momentum, leverages Large Language Model (LLM) to transform spikes into semantic trends, and employs a cross-attention mechanism to provide personalized catalog recommendations. Our comprehensive ablation experiments and evaluations validate the impact of each architectural component, showing consistent improvements across multiple critical business metrics. TrendPulse’s effectiveness is further validated through online A/B experiments, where it drives measurable gains in both business metrics and overall user experience. Finally, we outlined the deployment strategy in detail, providing a reproducible blueprint that can be readily applied to similar industry-scale applications.
Grounded Multimodal In-Context Learning for Product Weight Estimation at Scale in E-commerce
Bhavuk Singhal | Arsh Keshari | Ravindra Kumar Yadav
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Bhavuk Singhal | Arsh Keshari | Ravindra Kumar Yadav
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Accurately inferring implicit physical attributes of products, such as weight, is critical for large-scale e-commerce logistics but challenging due to sparse or unreliable textual metadata and high visual variability. We formulate weight estimation as a grounded multimodal reasoning problem and investigate whether large vision-language models (LVLMs) can infer discretized weight buckets through in-context learning (ICL) over product images and descriptions. We introduce a scalable inference framework that conditions predictions on automatically retrieved, category-specific exemplars and propose a distribution-calibrated retrieval strategy that aligns few-shot contexts with the empirical weight distribution of each product sub-category. This calibration substantially improves few-shot multimodal reasoning compared to random or embedding-based retrieval baselines. Across 14 high-variance categories, our approach significantly outperforms strong multimodal KNN baselines in both exact-match accuracy and near-bucket reliability. Deployed in production on a large e-commerce platform, our system processes millions of listings daily and reduces shipping-related revenue leakage by ∼22%, demonstrating that multimodal ICL can serve as a practical and cost-effective alternative to manual or hardware-based verification.
2024
GeoIndia: A Seq2Seq Geocoding Approach for Indian Addresses
Bhavuk Singhal | Anshu Aditya | Lokesh Todwal | Shubham Jain | Debashis Mukherjee
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Bhavuk Singhal | Anshu Aditya | Lokesh Todwal | Shubham Jain | Debashis Mukherjee
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Geocoding, the conversion of unstructured geographic text into structured spatial data, is essential for logistics, urban planning, and location-based services. Indian addresses with their diverse languages, scripts, and formats present significant challenges that existing geocoding methods often fail to address, particularly at fine-grained resolutions. In this paper, we propose GeoIndia, a novel geocoding system designed specifically for Indian addresses using hierarchical H3-cell prediction within a Seq2Seq framework. Our methodology includes a comprehensive analysis of Indian addressing systems, leading to the development of a data correction strategy that enhances prediction accuracy. We investigate two model architectures, Flan-T5-base (T5) and Llama-3-8b (QLF-Llama-3), due to their strong sequence generation capabilities. We trained around 29 models with one dedicated to each state, and results show that our approach provides superior accuracy and reliability across multiple Indian states, outperforming the well-renowned geocoding platform Google Maps. In multiple states, we achieved more than an 50% reduction in mean distance error and more than a 85% reduction in 99th percentile distance error compared to Google Maps. This advancement can help in optimizing logistics in the e-commerce sector, reducing delivery failures and improving customer satisfaction.
2023
IntenDD: A Unified Contrastive Learning Approach for Intent Detection and Discovery
Bhavuk Singhal | Ashim Gupta | V P Shivasankaran | Amrith Krishna
Findings of the Association for Computational Linguistics: EMNLP 2023
Bhavuk Singhal | Ashim Gupta | V P Shivasankaran | Amrith Krishna
Findings of the Association for Computational Linguistics: EMNLP 2023
Identifying intents from dialogue utterances forms an integral component of task-oriented dialogue systems. Intent-related tasks are typically formulated either as a classification task, where the utterances are classified into predefined categories or as a clustering task when new and previously unknown intent categories need to be discovered from these utterances. Further, the intent classification may be modeled in a multiclass (MC) or multilabel (ML) setup. While typically these tasks are modeled as separate tasks, we propose IntenDD a unified approach leveraging a shared utterance encoding backbone. IntenDD uses an entirely unsupervised contrastive learning strategy for representation learning, where pseudo-labels for the unlabeled utterances are generated based on their lexical features. Additionally, we introduce a two-step post-processing setup for the classification tasks using modified adsorption. Here, first, the residuals in the training data are propagated followed by smoothing the labels both modeled in a transductive setting. Through extensive evaluations on various benchmark datasets, we find that our approach consistently outperforms competitive baselines across all three tasks. On average, IntenDD reports percentage improvements of 2.32 %, 1.26 %, and 1.52 % in their respective metrics for few-shot MC, few-shot ML, and the intent discovery tasks respectively.
Scaling Neural ITN for Numbers and Temporal Expressions in Tamil: Findings for an Agglutinative Low-resource Language
Bhavuk Singhal | Sindhuja Gopalan | Amrith Krishna | Malolan Chetlur
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
Bhavuk Singhal | Sindhuja Gopalan | Amrith Krishna | Malolan Chetlur
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
ITN involves rewriting the verbalised form of text from spoken transcripts to its corresponding written form. The task inherently expects challenges in identifying ITN entries due to spelling variations in words arising out of dialects, transcription errors etc. Additionally, in Tamil, word boundaries between adjacent words in a sentence often get obscured due to Punarchi, i.e. phonetic transformation of these boundaries. Being morphologically rich, the words in Tamil show a high degree of agglutination due to inflection and clitics. The combination of such factors leads to a high degree of surface-form variations, making scalability with pure rule-based approaches difficult. Instead, we experiment with fine-tuning three pre-trained neural LMs, consisting of a seq2seq model (s2s), a non-autoregressive text editor (NAR) and a sequence tagger + rules combination (tagger). While the tagger approach works best in a fully-supervised setting, s2s performs the best (98.05 F-Score) when augmented with additional data, via bootstrapping and data augmentation (DA&B). S2S reports a cumulative percentage improvement of 20.1 %, and statistically significant gains for all our models with DA&B. Compared to a fully supervised setup, bootstrapping alone reports a percentage improvement as high as 14.12 %, even with a small seed set of 324 ITN entries.