Gokul Swamy
2026
Gained in Translation: Privileged Pairwise Judges Enhance Multilingual Reasoning
Lintang Sutawika | Gokul Swamy | Steven Wu | Graham Neubig
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Lintang Sutawika | Gokul Swamy | Steven Wu | Graham Neubig
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
When asked a question in a language less seen in its training data, current reasoning large language models (RLMs) often exhibit dramatically lower performance than when asked the same question in English. In response, we introduce (Self-Play with Privileged Pairwise Feedback), a two-stage framework for enhancing multilingual reasoning without any data in the target language(s). First, we supervise fine-tune (SFT) on translated versions of English question-answer pairs to raise base model correctness. Second, we perform RL with feedback from a pairwise judge in a self-play fashion, with the judge receiving the English reference response as privileged information. Thus, even when none of the model’s responses are completely correct, the privileged pairwise judge can still tell which response is better. End-to-end, greatly improves base model performance, even outperforming fully post-trained models on multiple math and non-math tasks with less than 1/8 of the training data across the single-language, multilingual, and generalization to unseen language settings.
2025
An Address Intelligence Framework for E-commerce Deliveries
Gokul Swamy | Aman Gulati | Srinivas Virinchi | Anoop Saladi
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Gokul Swamy | Aman Gulati | Srinivas Virinchi | Anoop Saladi
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
For an e-commerce domain, the customeraddress is the single most important pieceof customer data for ensuring accurateand reliable deliveries. In this two-partstudy, we first outline the construction ofa language model to assist customers withaddress standardization and in the latterpart, we detail a novel Pareto-ensemblemulti-task prediction algorithm that derives critical insights from customer addresses to minimize operational losses arising from a given geographical area. Finally, we demonstrate the potential benefits ofthe proposed address intelligence systemfor a large e-commerce domain throughlarge scale experiments on a commercialsystem.
VADE: Visual Attention Guided Hallucination Detection and Elimination
Vishnu Prabhakaran | Purav Aggarwal | Vinay Kumar Verma | Gokul Swamy | Anoop Saladi
Findings of the Association for Computational Linguistics: ACL 2025
Vishnu Prabhakaran | Purav Aggarwal | Vinay Kumar Verma | Gokul Swamy | Anoop Saladi
Findings of the Association for Computational Linguistics: ACL 2025
Vision Language Models (VLMs) have achieved significant advancements in complex visual understanding tasks. However, VLMs are prone to hallucinations—generating outputs that lack alignment with visual content. This paper addresses hallucination detection in VLMs by leveraging the visual grounding information encoded in transformer attention maps. We identify three primary challenges in this approach: the elective nature of visual grounding for certain tokens, the high-dimensional and noisy nature of attention maps, and the dynamic sequence length of attention on previous tokens. To address these, we propose VADE, a novel sequence modelling approach to effectively learn complex sequential patterns from high-dimensional and noisy attention maps for fine-grained hallucination detection and mitigation. VADE achieves an average PR-AUC of 80% in hallucination detection on M-HalDetect across four different model architectures and an 5% improvement in hallucination mitigation on MSCOCO.