Shivam Ratnakar

2026

LatentGate: Low-Latency Semantic Routing via Frozen-Backbone Probing of Small Language Models
Shivam Ratnakar | Abhiroop Talasila | Vinayak K Doifode
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)

As Multi-Agent Systems scale to hundreds of specialized agents, routing becomes a critical bottleneck. Prompt-based LLM routers deliver strong semantic reasoning but incur prohibitive latency (~1500–2000ms) and cost that scales with agent count, while embedding-based routers are fast (25–50ms) but collapse semantically similar yet functionally distinct agents. We identify *representation anisotropy*, the geometric collapse of hidden-state vectors into a narrow cone, as a key mechanism underlying embedding-based routing failure. We propose **LatentGate**, a non-generative router that extracts mean-pooled hidden states from a frozen small language model (SLM), applies PCA-whitening to resolve the anisotropy, and trains a lightweight linear probe for agent classification. Across 5 SLM backbones and 100 enterprise agents, LatentGate achieves 98.8% in-domain and 80.0% OOD accuracy on natural queries, 13–22 absolute points above embedding baselines, and 92.9% on CLINC150. It takes ~28ms to run on a T4 GPU, with the SLM forward pass independent of agent count and classification adding a negligible O(Ck) term. We demonstrate the potential of using a lightweight linear probe to enable sub-10ms warm-start retraining from user feedback, providing a foundation for continual learning in production environments. Benchmarking prompt-based routing with GPT-4.1, GPT-4.1-nano, and Gemini 2.5 Flash confirms degradation to 70–77% accuracy at 100 agents with 1500–2000ms latency, motivating non-generative alternatives.

2024

pdf bib abs

STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models
Shreyas Basavatia | Keerthiram Murugesan | Shivam Ratnakar
Findings of the Association for Computational Linguistics: ACL 2024

Interactive fiction games have emerged as an important application to improve the generalization capabilities of language-based reinforcement learning (RL) agents. Existing environments for interactive fiction games are domain-specific or time-consuming to generate and do not train the RL agents to master a specific set of skills. In this work, we introduce an interactive environment for self-supervised RL, STARLING, for text-based games that bootstraps the text-based RL agents with automatically generated games (based on the seed set of game ideas) to boost the performance and generalization capabilities to reach a goal of the target environment. These games let the agent hone their skills on a predefined set of tasks. We create and test an environment with 100 games, generated using this automated framework that uses large language models (GPT3) and an interactive fiction game engine (based on Inform7) to provide the user with the ability to generate more games under minimal human supervision. Experimental results based on both the human participants and baseline text-based RL agents reveal that current state-of-the-art text-based RL agents cannot use previously learned skills in new situations at the level humans can. These results enforce STARLING’s potential to serve as a sandbox environment for further research in self-supervised text-based RL.

Co-authors

Venues

ACL1
Findings1

Fix author