Ayush Singh

Papers on this page may belong to the following people: Ayush Singh, Ayush Singh


2026

Contract compliance verification requires reasoning about cross-clause dependencies where obligations, exceptions, and conditions interact across multiple provisions, yet existing legal NLP benchmarks like ContractNLI and CUAD focus exclusively on isolated single-clause tasks. We introduce COMPACT (COMpliance PAralegals via Clause graph reasoning over conTracts), a framework that models cross-clause dependencies through structured clause graphs. Our approach extracts deontic-temporal entities from clauses and constructs typed relationship graphs capturing definitional dependencies, exception hierarchies, and temporal sequences. From these graphs, we introduce ACE (Assessing Compliance in Enterprise)- a benchmark containing 4,700 carefully constructed compliance scenarios derived from 633 real-world contracts covering 26 types of agreements. Each scenario requires multi-hop reasoning across multiple clauses, and undergoes independent LLM-based validation to ensure quality. Evaluation reveals that multi-clause reasoning poses a fundamental challenge for state-of-the-art models (34-57% base accuracy), while training on ACE yields substantial improvements on compliance tasks (+22–43 % points) and also enhances general legal reasoning performance on other benchmarks (PrivaCI-Bench, ContractNLI).

2024

Large language models (LLMs) have shown remarkable performance on many tasks in different domains. However, their performance in contextual biomedical machine reading comprehension (MRC) has not been evaluated in depth. In this work, we evaluate GPT on four contextual biomedical MRC benchmarks. We experiment with different conventional prompting techniques as well as introduce our own novel prompting method. To solve some of the retrieval problems inherent to LLMs, we propose a prompting strategy named Implicit Retrieval Augmented Generation (RAG) that alleviates the need for using vector databases to retrieve important chunks in traditional RAG setups. Moreover, we report qualitative assessments on the natural language generation outputs from our approach. The results show that our new prompting technique is able to get the best performance in two out of four datasets and ranks second in rest of them. Experiments show that modern-day LLMs like GPT even in a zero-shot setting can outperform supervised models, leading to new state-of-the-art (SoTA) results on two of the benchmarks.
Electronic health records (EHR) even though a boon for healthcare practitioners, are grow- ing convoluted and longer every day. Sifting around these lengthy EHRs is taxing and be- comes a cumbersome part of physician-patient interaction. Several approaches have been pro- posed to help alleviate this prevalent issue ei- ther via summarization or sectioning, however, only a few approaches have truly been helpful in the past. With the rise of automated methods, machine learning (ML) has shown promise in solving the task of identifying relevant sections in EHR. However, most ML methods rely on labeled data which is difficult to get in health- care. Large language models (LLMs) on the other hand, have performed impressive feats in natural language processing (NLP), that too in a zero-shot manner, i.e. without any labeled data. To that end, we propose using LLMs to identify relevant section headers. We find that GPT-4 can effectively solve the task on both zero and few-shot settings as well as segment dramatically better than state-of-the-art meth- ods. Additionally, we also annotate a much harder real world dataset and find that GPT-4 struggles to perform well, alluding to further research and harder benchmarks.

2023

Performing prior authorization on patients in a medical facility is a time-consuming and challenging task for insurance companies. Automating the clinical decisions that lead to authorization can reduce the time that staff spend executing such procedures. To better facilitate such critical decision making, we present an automated approach to predict one of the challenging tasks in the process called primary clinical indicator prediction, which is the outcome of this procedure. The proposed solution is to create a taxonomy to capture the main categories in primary clinical indicators. Our approach involves an important step of selecting what is known as the “primary indicator” – one of the several heuristics based on clinical guidelines that are published and publicly available. A taxonomy based PI classification system was created to help in the recognition of PIs from free text in electronic health records (EHRs). This taxonomy includes comprehensive explanations of each PI, as well as examples of free text that could be used to detect each PI. The major contribution of this work is to introduce a taxonomy created by three professional nurses with many years of experience. We experiment with several state-of-the-art supervised and unsupervised techniques with a focus on prior approval for spinal imaging. The results indicate that the proposed taxonomy is capable of increasing the performance of unsupervised approaches by up to 10 F1 points. Further, in the supervised setting, we achieve an F1 score of 0.61 using a conventional technique based on term frequency–inverse document frequency that outperforms other deep-learning approaches.