Krish Sharma
2026
TELL-TALE: Task Efficient LLMs with Task Aware Layer Elimination
Omar Naim | Krish Sharma | Niyar R Barman | Nicholas Asher
Findings of the Association for Computational Linguistics: ACL 2026
Omar Naim | Krish Sharma | Niyar R Barman | Nicholas Asher
Findings of the Association for Computational Linguistics: ACL 2026
Large Language Models (LLMs) typically come with a fixed architecture, despite growing evidence that not all layers contribute equally to every downstream task. We introduce TALE (Task-Aware Layer Elimination), an inference-time method that improves task performance by selectively removing layers that are irrelevant or detrimental for a given task. TALE optimizes task-specific performance, yielding a task-optimized architecture without retraining. Across 9 tasks and 5 model families, under both zero-shot and few-shot settings, TALE consistently matches or surpasses baseline performance while simultaneously reducing computational costs. TALE also synergizes with fine-tuning, leading to further performance improvements. Computing TALE for a new task requires modest resources, making it a practical and deployable solution for task-specialized LLM inference.
2025
DIMSUM: Discourse in Mathematical Reasoning as a Supervision Module
Krish Sharma | Niyar R Barman | Akshay Chaturvedi | Nicholas Asher
Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Krish Sharma | Niyar R Barman | Akshay Chaturvedi | Nicholas Asher
Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue
We look at reasoning on GSM8k, a dataset of short texts presenting primary school, math problems. We find, with Mirzadeh et al (2024), that current LLM progress on the data set may not be explained by better reasoning but by exposure to a broader pretraining data distribution. We then introduce a novel information source for helping models with less data or inferior training reason better: discourse structure. We show that discourse structure improves performance for models like Llama2 13b by up to 160%. Even for models that have most likely memorized the data set, adding discourse structural information to the model still improves predictions and dramatically improves large model performance on out of distribution examples.
2023
Counter Turing Test (CT²): AI-Generated Text Detection is Not as Easy as You May Think - Introducing AI Detectability Index
Megha Chakraborty | S.M Towhidul Islam Tonmoy | S M Mehedi Zaman | Krish Sharma | Niyar R Barman | Chandan Gupta | Shreya Gautam | Tanay Kumar | Vinija Jain | Aman Chadha | Amit P. Sheth | Amitava Das
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Megha Chakraborty | S.M Towhidul Islam Tonmoy | S M Mehedi Zaman | Krish Sharma | Niyar R Barman | Chandan Gupta | Shreya Gautam | Tanay Kumar | Vinija Jain | Aman Chadha | Amit P. Sheth | Amitava Das
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
With the rise of prolific ChatGPT, the risk and consequences of AI-generated text has increased alarmingly. This triggered a series of events, including an open letter, signed by thousands of researchers and tech leaders in March 2023, demanding a six-month moratorium on the training of AI systems more sophisticated than GPT-4. To address the inevitable question of ownership attribution for AI-generated artifacts, the US Copyright Office released a statement stating that “if the content is traditional elements of authorship produced by a machine, the work lacks human authorship and the office will not register it for copyright”. Furthermore, both the US and the EU governments have recently drafted their initial proposals regarding the regulatory framework for AI. Given this cynosural spotlight on generative AI, AI-generated text detection (AGTD) has emerged as a topic that has already received immediate attention in research, with some initial methods having been proposed, soon followed by the emergence of techniques to bypass detection. This paper introduces the Counter Turing Test (CT2), a benchmark consisting of techniques aiming to offer a comprehensive evaluation of the robustness of existing AGTD techniques. Our empirical findings unequivocally highlight the fragility of the proposed AGTD methods under scrutiny. Amidst the extensive deliberations on policy-making for regulating AI development, it is of utmost importance to assess the detectability of content generated by LLMs. Thus, to establish a quantifiable spectrum facilitating the evaluation and ranking of LLMs according to their detectability levels, we propose the AI Detectability Index (ADI). We conduct a thorough examination of 15 contemporary LLMs, empirically demonstrating that larger LLMs tend to have a lower ADI, indicating they are less detectable compared to smaller LLMs. We firmly believe that ADI holds significant value as a tool for the wider NLP community, with the potential to serve as a rubric in AI-related policy-making.