Sameep Mehta

2025

pdf bib abs
Classifier-Augmented Generation for Structured Workflow Prediction
Thomas Gschwind | Shramona Chakraborty | Nitin Gupta | Sameep Mehta
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

ETL (Extract, Transform, Load) tools such as IBM DataStage allow users to visually assemble complex data workflows, but configuring stages and their properties remains time consuming and requires deep tool knowledge. We propose a system that translates natural language descriptions into executable workflows, automatically predicting both the structure and detailed configuration of the flow. At its core lies a Classifier-Augmented Generation (CAG) approach that combines utterance decomposition with a classifier and stage-specific few-shot prompting to produce accurate stage predictions. These stages are then connected into non-linear workflows using edge prediction, and stage properties are inferred from sub-utterance context. We compare CAG against strong single-prompt and agentic baselines, showing improved accuracy and efficiency, while substantially reducing token usage. Our architecture is modular, interpretable, and capable of end-to-end workflow generation, including robust validation steps. To our knowledge, this is the first system with a detailed evaluation across stage prediction, edge layout, and property generation for natural-language-driven ETL authoring.

In enterprise systems, tasks like API integration, ETL pipeline creation, customer record merging, and data consolidation rely on accurately aligning attributes that refer to the same real-world concept but differ across schemas. This semantic attribute alignment is critical for enabling schema unification, reporting, and analytics. The challenge is amplified in schema only settings where no instance data is available due to ambiguous names, inconsistent descriptions, and varied naming conventions.We propose a hybrid, unsupervised framework that combines the contextual reasoning of Large Language Models (LLMs) with the stability of embedding-based similarity and schema grouping to address token limitations and hallucinations. Our method operates solely on metadata and scales to large schemas by grouping attributes and refining LLM outputs through embedding-based enhancement, justification filtering, and ranking. Experiments on real-world healthcare schemas show strong performance, highlighting the effectiveness of the framework in privacy-constrained scenarios.

pdf bib abs
Divide, Link, and Conquer: Recall-oriented Schema Linking for NL-to-SQL via Question Decomposition
Kiran Pradeep | Kirushikesh Db | Nishtha Madaan | Sameep Mehta | Pushpak Bhattacharyya
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Natural language to SQL (NL-to-SQL) systems are increasingly critical in industry for enabling non-technical users to access structured data efficiently, supporting faster decision-making and data accessibility. However, state-of-the-art systems often depend on large proprietary models, which introduce serious concerns around privacy. While open-source LLMs offer a viable substitute, high-performing variants (e.g., 70B or 405B) require substantial GPU memory, making them impractical for many production environments. Smaller open-source models that fit on a single 80GB GPU present a more deployable alternative, yet existing efforts to enhance their Text-to-SQL performance rely heavily on fine-tuning, limiting flexibility. We propose RoSL, a plug-and-play framework that improves SQL generation for smaller LLMs without any task-specific training. While schema linking is often omitted for larger models, we show it remains essential for smaller ones. Further, we are the first to apply question decomposition at the schema linking stage, rather than during SQL generation as in prior work, to address the precision-recall tradeoff. Our approach improves schema linking recall by 25.1% and execution accuracy by 8.2% on the BIRD benchmark using ibm-granite/granite-3.3-8b-instruct, making it an effective and industry-friendly NL-to-SQL solution. We further analyze RoSL’s latency–efficiency characteristics, showing that it maintains practical efficiency for real-world deployment.

pdf bib abs
Quality Assessment of Tabular Data using Large Language Models and Code Generation
Ashlesha Akella | Akshar Kaul | Krishnasuri Narayanam | Sameep Mehta
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Reliable data quality is crucial for downstream analysis of tabular datasets, yet rule-based validation often struggles with inefficiency, human intervention, and high computational costs. We present a three-stage framework that combines statistical inliner detection with LLM-driven rule and code generation. After filtering data samples through traditional clustering, we iteratively prompt LLMs to produce semantically valid quality rules and synthesize their executable validators through code-generating LLMs. To generate reliable quality rules, we aid LLMs with retrieval-augmented generation (RAG) by leveraging external knowledge sources and domain-specific few-shot examples. Robust guardrails ensure the accuracy and consistency of both rules and code snippets. Extensive evaluations on benchmark datasets confirm the effectiveness of our approach.

pdf bib abs
CodeGenWrangler: Data Wrangling task automation using Code-Generating Models
Ashlesha Akella | Abhijit Manatkar | Krishnasuri Narayanam | Sameep Mehta
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track)

Assuring the data quality of tabular datasets is essential for the efficiency of the diverse tabular downstream tasks (like summarization and fact-checking). Data-wrangling tasks effectively address the challenges associated with structured data processing to improve the quality of tabular data. Traditional statistical methods handle numeric data efficiently but often fail to understand the semantic context of the textual data in tables. Deep learning approaches are resource-intensive, requiring task and dataset-specific training. Addressing these shortcomings, we present an automated system that leverages LLMs to generate executable code for data-wrangling tasks like missing value imputation, error detection, and error correction. Our system aims to identify inherent patterns in the data while leveraging external knowledge, effectively addressing both memory-independent and memory-dependent tasks.

pdf bib abs
Schema and Natural Language Aware In-Context Learning for Improved GraphQL Query Generation
Nitin Gupta | Manish Kesarwani | Sambit Ghosh | Sameep Mehta | Carlos Eberhardt | Dan Debrunner
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track)

GraphQL offers a flexible alternative to REST APIs, allowing precise data retrieval across multiple sources in a single query. However, generating complex GraphQL queries remains a significant challenge. Large Language Models (LLMs), while powerful, often produce suboptimal queries due to limited exposure to GraphQL schemas and their structural intricacies.Custom prompt engineering with in-context examples is a common approach to guide LLMs, but existing methods, like randomly selecting examples, often yield unsatisfactory results. While semantic similarity-based selection is effective in other domains, it falls short for GraphQL, where understanding schema-specific nuances is crucial for accurate query formulation.To address this, we propose a Schema and NL-Aware In-context Learning (SNAIL) framework that integrates both structural and semantic information from GraphQL schemas with natural language inputs, enabling schema-aware in-context learning. Unlike existing methods, our approach captures the complexities of GraphQL schemas to improve query generation accuracy.We validate this framework on a publicly available complex GraphQL test dataset, demonstrating notable performance improvements, with specific query classes showing up to a 20% performance improvement for certain LLMs. As GraphQL adoption grows, with Gartner predicting over 60% of enterprises will use it in production by 2027, this work addresses a critical need, paving the way for more efficient and reliable GraphQL query generation in enterprise applications.

2024

Function calling using Large Language Models (LLMs) is an active research area that aims to empower LLMs with the ability to execute APIs to perform real-world tasks. However, sequential function calling using LLMs with interdependence between functions is still under-explored. To this end, we introduce GraphQLRestBench, a dataset consisting of natural language utterances paired with function call sequences representing real-world REST API calls with variable mapping between functions. In order to represent the response structure of the functions in the LLM prompt, we use the GraphQL schema of the REST APIs. We also introduce a custom evaluation framework for our dataset consisting of four specially designed metrics. We evaluate various open-source LLMs on our dataset using few-shot Chain-of-Thought and ReAct prompting to establish a reasonable baseline.

GraphQL is a powerful query language for APIs that allows clients to fetch precise data efficiently and flexibly, querying multiple resources with a single request. However, crafting complex GraphQL query operations can be challenging. Large Language Models (LLMs) offer an alternative by generating GraphQL queries from natural language, but they struggle due to limited exposure to publicly available GraphQL schemas, often resulting in invalid or suboptimal queries. Furthermore, no benchmark test data suite is available to reliably evaluate the performance of contemporary LLMs.To address this, we present a large-scale, cross-domain Text-to-GraphQL query operation dataset. The dataset includes 10,940 training triples spanning 185 cross-source data stores and 957 test triples over 14 data stores. Each triple consists of a GraphQL schema, GraphQL query operation, and corresponding natural language query. The dataset has been predominantly manually created, with natural language paraphrasing, and carefully validated, requiring approximately 1200 person-hours. In our evaluation, we tested 10 state-of-the-art LLMs using our test dataset. The best-performing model achieved an accuracy of only around 50% with one in-context few-shot example, underscoring the necessity for custom fine-tuning. To support further research and benchmarking, we are releasing the training and test datasets under the MIT License. The dataset is available at https://github.com/stepzen-dev/NL2GQL.

2023

pdf bib abs
CFL: Causally Fair Language Models Through Token-level Attribute Controlled Generation
Rahul Madhavan | Rishabh Garg | Kahini Wadhawan | Sameep Mehta
Findings of the Association for Computational Linguistics: ACL 2023

We propose a method to control the attributes of Language Models (LMs) for the text generation task using Causal Average Treatment Effect (ATE) scores and counterfactual augmentation. We explore this method, in the context of LM detoxification, and propose the Causally Fair Language (CFL) architecture for detoxifying pre-trained LMs in a plug-and-play manner. Our architecture is based on a Structural Causal Model (SCM) that is mathematically transparent and computationally efficient as compared with many existing detoxification techniques. We also propose several new metrics that aim to better understand the behaviour of LMs in the context of toxic text generation. Further, we achieve state of the art performance for toxic degeneration, which are computed using Real Toxicity Prompts. Our experiments show that CFL achieves such a detoxification without much impact on the model perplexity. We also show that CFL mitigates the unintended bias problem through experiments on the BOLD dataset.

2013

pdf bib
An Empirical Assessment of Contemporary Online Media in Ad-Hoc Corpus Creation for Social Events
Kanika Narang | Seema Nagar | Sameep Mehta | L V Subramaniam | Kuntal Dey
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
NLP for uncertain data at scale
Sameep Mehta | L. V. Subramaniam
NAACL HLT 2013 Tutorial Abstracts

Co-authors

Venues

Fix author