Ashutosh Adhikari

2025

pdf bib abs
Debating for Better Reasoning in Vision-Language Models
Ashutosh Adhikari | Mirella Lapata
Findings of the Association for Computational Linguistics: EMNLP 2025

As Large Language Models (LLMs) gain expertise across diverse domains and modalities, scalable oversight becomes increasingly challenging, particularly when their capabilities may surpass human evaluators. Debate has emerged as a promising mechanism for enabling such oversight. We extend the debate paradigm to a multimodal setting, exploring its potential for blind models to supervise and enhance the performance of sighted ones. We focus on visual question answering (VQA), where two “sighted” expert vision-language models debate an answer, while a “blind” (text-only) judge adjudicates based solely on the quality of the arguments. In our framework, the experts only defend answers aligned with their beliefs, thereby obviating the need for explicit role-playing and concentrating the debate on instances of expert disagreement. Experiments on several multimodal tasks demonstrate that the debate framework consistently outperforms individual expert models. Moreover, judgments from blind LLMs can be used to instil reasoning capabilities in vision-language models through fine-tuning.

2020

pdf bib abs
Exploring the Limits of Simple Learners in Knowledge Distillation for Document Classification with DocBERT
Ashutosh Adhikari | Achyudh Ram | Raphael Tang | William L. Hamilton | Jimmy Lin
Proceedings of the 5th Workshop on Representation Learning for NLP

Fine-tuned variants of BERT are able to achieve state-of-the-art accuracy on many natural language processing tasks, although at significant computational costs. In this paper, we verify BERT’s effectiveness for document classification and investigate the extent to which BERT-level effectiveness can be obtained by different baselines, combined with knowledge distillation—a popular model compression method. The results show that BERT-level effectiveness can be achieved by a single-layer LSTM with at least 40× fewer FLOPS and only ∼3% parameters. More importantly, this study analyzes the limits of knowledge distillation as we distill BERT’s knowledge all the way down to linear models—a relevant baseline for the task. We report substantial improvement in effectiveness for even the simplest models, as they capture the knowledge learnt by BERT.

2019

pdf bib abs
Rethinking Complex Neural Network Architectures for Document Classification
Ashutosh Adhikari | Achyudh Ram | Raphael Tang | Jimmy Lin
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Neural network models for many NLP tasks have grown increasingly complex in recent years, making training and deployment more difficult. A number of recent papers have questioned the necessity of such architectures and found that well-executed, simpler models are quite effective. We show that this is also the case for document classification: in a large-scale reproducibility study of several recent neural models, we find that a simple BiLSTM architecture with appropriate regularization yields accuracy and F1 that are either competitive or exceed the state of the art on four standard benchmark datasets. Surprisingly, our simple model is able to achieve these results without attention mechanisms. While these regularization techniques, borrowed from language modeling, are not novel, to our knowledge we are the first to apply them in this context. Our work provides an open-source platform and the foundation for future work in document classification.

Co-authors

Venues

Fix data