Manoj Apte

2025

pdf bib abs
QueryShield: A Platform to Mitigate Enterprise Data Leakage in Queries to External LLMs
Nitin Ramrakhiyani | Delton Myalil | Sachin Pawar | Manoj Apte | Rajan M A | Divyesh Saglani | Imtiyazuddin Shaik
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track)

Unrestricted access to external Large Language Models (LLM) based services like ChatGPT and Gemini can lead to potential data leakages, especially for large enterprises providing products and services to clients that require legal confidentiality guarantees. However, a blanket restriction on such services is not ideal as these LLMs boost employee productivity. Our goal is to build a solution that enables enterprise employees to query such external LLMs, without leaking confidential internal and client information. In this paper, we propose QueryShield - a platform that enterprises can use to interact with external LLMs without leaking data through queries. It detects if a query leaks data, and rephrases it to minimize data leakage while limiting the impact to its semantics. We construct a dataset of 1500 queries and manually annotate them for their sensitivity labels and their low sensitivity rephrased versions. We fine-tune a set of lightweight model candidates using this dataset and evaluate them using multiple metrics including one we propose specific to this problem.

2024

pdf bib abs
Why Generate When You Can Discriminate? A Novel Technique for Text Classification using Language Models
Sachin Pawar | Nitin Ramrakhiyani | Anubhav Sinha | Manoj Apte | Girish Palshikar
Findings of the Association for Computational Linguistics: EACL 2024

In this paper, we propose a novel two-step technique for text classification using autoregressive Language Models (LM). In the first step, a set of perplexity and log-likelihood based numeric features are elicited from an LM for a text instance to be classified. Then, in the second step, a classifier based on these features is trained to predict the final label. The classifier used is usually a simple machine learning classifier like Support Vector Machine (SVM) or Logistic Regression (LR) and it is trained using a small set of training examples. We believe, our technique presents a whole new way of exploiting the available training instances, in addition to the existing ways like fine-tuning LMs or in-context learning. Our approach stands out by eliminating the need for parameter updates in LMs, as required in fine-tuning, and does not impose limitations on the number of training examples faced while building prompts for in-context learning. We evaluate our technique across 5 different datasets and compare with multiple competent baselines.

2023

pdf bib abs
Audit Report Coverage Assessment using Sentence Classification
Sushodhan Vaishampayan | Nitin Ramrakhiyani | Sachin Pawar | Aditi Pawde | Manoj Apte | Girish Palshikar
Proceedings of the Sixth Workshop on Financial Technology and Natural Language Processing

Audit reports are a window to the financial health of a company and hence gauging coverage of various audit aspects in them is important. In this paper, we aim at determining an audit report’s coverage through classification of its sentences into multiple domain specific classes. In a weakly supervised setting, we employ a rule-based approach to automatically create training data for a BERT-based multi-label classifier. We then devise an ensemble to combine both the rule based and classifier approaches. Further, we employ two novel ways to improve the ensemble’s generalization: (i) through an active learning based approach and, (ii) through a LLM based review. We demonstrate that our proposed approaches outperform several baselines. We show utility of the proposed approaches to measure audit coverage on a large dataset of 2.8K audit reports.

Manoj Apte

2025

2024

2023

2016

Co-authors

Venues