Akshay Verma

2026

SELENE: Selective and Evidence-Weighted LLM Debating for Efficient and Reliable Reasoning
Akshay Verma | Swapnil Gupta | Deepak Gupta | Prateek Sircar | Siddharth Pillai
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)

Multi-Agent Debate (MAD) frameworks improve factual reliability in large language models (LLMs) by allowing agents to critiqueand refine one another’s reasoning. Yet, existing MAD systems are computationally expensive and prone to degradation under pro-longed debates due to redundant exchanges and unstable judging. We propose a lightweight,industry-deployable alternative that unifies Selective Debate Initiation (SDI) with Evidence Weighted Self-Consistency (EWSC) for adaptive, debate-on-demand reasoning. SDI dynamically predicts when debate is necessary by detecting confidence-likelihood misalignment and semantic disagreement, skippingwell-aligned queries to conserve computation. EWSC replaces a single-judge verdict with a variance-aware, evidence-weighted aggregation across paraphrased evaluations, yielding more stable factual judgments. Combined, SDI and EWSC reduce token consumption by nearly 50% while improving both accuracy and calibration. Evaluated on BoolQ, CosmosQA, and an internal QnA benchmark, our framework achieves higher factual robustness and efficiency, demonstrating that scalable, epistemically reliable multi-agent reasoning is practical for real-world LLM deployments.

pdf bib abs

ReflectiveRAG: Rethinking Adaptivity in Retrieval-Augmented Generation
Akshay Verma | Swapnil Gupta | Siddharth Pillai | Prateek Sircar | Deepak Gupta
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)

Retrieval-Augmented Generation (RAG) systems degrade sharply under extreme noise,where irrelevant or redundant passages dominate. Current methods-fixed top-k retrieval, cross-encoder reranking, or policy based iteration-depend on static heuristics orcostly reinforcement learning, failing to assess evidence sufficiency, detect subtle mismatches, or reduce redundancy, leading to hallucinations and poor grounding. We introduce ReflectiveRAG, a lightweight yet reasoning-driven architecture that enhances factual grounding through two complementary mechanisms: Self-Reflective Retrieval (SRR) and Contrastive Noise Removal (NR). SRR employs small language model as a decision controller that iteratively evaluates evidence sufficiency, enabling adaptive query reformulation withoutfixed schedules or policy training. NR further refines retrieved content via embedding-based contrastive filtering, enforcing semanticsparsity and removing redundant or tangential passages. Evaluated on WebQuestions, HotpotQA (distractor setting) and InternalQAwith 50M Common Crawl distractors, ReflectiveRAG achieves substantial gains over strong baselines-including DeepRAG-improving EMby +2.7 pp and F1 by +2.5 pp, while reducing evidence redundancy by 30.88% with only 18 ms additional latency. Ablation studies con-firm that SRR and NR jointly drive both factual accuracy and efficiency, validating our central claim that retrieval reasoning and contrastivefiltering can outperform large-scale policy optimization in RAG.

pdf bib abs

NEST: Nested Evidence Survival for Retrieval
Akshay Verma | Siddharth Pillai | Prateek Sircar | Deepak Gupta
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)

Retrieval-Augmented Generation (RAG) systems degrade sharply under extreme noise, where relevant evidence is sparse and easily pruned by static retrieval decisions. Existing approaches fixed top-k retrieval, hierarchical chunking, cross-encoder reranking, or policy-based iterative control- either rely on rigid heuristics or incur substantial computational overhead, and often fail to recover context-dependent evidence without introducing redundancy or latency. We introduce NEST (Nested Evidence Survival for Retrieval), a lightweight, training-free RAG framework that improves factual grounding by explicitly separating recall amplification from precision selection. NEST first maximizes recall through Nested Evidence Survival, evaluating candidates under nested retrieval contexts to rescue evidence that would otherwise be pruned by static chunking. It then applies a survival-consistent Mean Reciprocal Rank (MRR) selection mechanism to retain evidence that remains salient across retrieval scopes, removing redundancy without harming recall. Evaluated on WebQuestions, HotpotQA (distractor setting), and a proprietary InternalQA benchmark with 50M Common Crawl distractors, NEST consistently outperforms strong adaptive RAG baselines, including DeepRAG, improving EM by up to +2.4 pp and F1 by +2.1 pp, while increasing retrieval recall by +6.8 pp. These gains are achieved with only 12–18 ms additional latency. Ablation studies confirm that Nested Evidence Survival drives recall improvements, while MRR-based selection converts these gains into precision, demonstrating that recall-first retrieval with principled pruning can outperform iterative control and model scaling in retrieval-augmented generation.

Co-authors

Venues

EACL2
ACL1

Fix author