Shiva Krishna Reddy Malay

2026

Iterative RAG for multi-hop question answering faces challenges with lengthy contexts and the buildup of irrelevant information. This hinders a model’s capacity to process and reason over retrieved content and limits performance. While recent methods focus on compressing retrieved information, they are either restricted to single-round RAG, require finetuning or lack scalability in iterative RAG. To address these, we propose NotesWriting, a method that generates concise and relevant notes from retrieved documents at each step, thereby reducing noise and retaining only essential information. This increases the effective context length of Large Language Models (LLMs), allowing them to reason and plan more effectively while processing larger volumes of input text due to the compression in the form of notes. NotesWriting is framework agnostic and can be integrated with different iterative RAG methods. We demonstrate its effectiveness with three iterative RAG methods, across two models and four evaluation datasets. NotesWriting yields an average improvement of 15.6 percentage points overall, by scaling the amount of ingested information.

2025

pdf bib abs

Auto-Cypher: Improving LLMs on Cypher generation via LLM-supervised generation-verification framework
Aman Tiwari | Shiva Krishna Reddy Malay | Vikas Yadav | Masoud Hashemi | Sathwik Tejaswi Madhusudhan
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)

Graph databases like Neo4j are gaining popularity for handling complex, interconnected data, over traditional relational databases in modeling and querying relationships. While translating natural language into SQL queries is well-researched, generating Cypher queries for Neo4j remains relatively underexplored. In this work, we present an automated, LLM Supervised, pipeline to generate high quality synthetic data for Text2Cypher. Our Cypher data generation pipeline introduces LLM-As-Database-Filler, a novel strategy for ensuring Cypher query correctness, thus resulting in high quality generations. Using our pipeline, we generate high quality Text2Cypher data - SynthCypher containing 29.8k instances across various domains and queries with varying complexities. Training open-source LLMs like LLaMa-3.1-8B, Mistral-7B, and QWEN7B on SynthCypher results in performance gains of up to 40% on the Text2Cypher test split and 30% on the SPIDER benchmark, adapted for graph databases.

Co-authors

Rishabh Maheshwary 1

Sai Rajeswar Mudumba 1

Aman Tiwari 1

Venues

LREC1
NAACL1

Fix author