Samarth Bhargav


2025

pdf bib
A Comprehensive Taxonomy of Negation for NLP and Neural Retrievers
Roxana Petcu | Samarth Bhargav | Maarten de Rijke | Evangelos Kanoulas
Findings of the Association for Computational Linguistics: EMNLP 2025

Understanding and solving complex reasoning tasks is vital for addressing the information needs of a user. Although dense neural models learn contextualised embeddings, they underperform on queries containing negation. To understand this phenomenon, we study negation in traditional neural information retrieval and LLM-based models. We (1) introduce a taxonomy of negation that derives from philosophical, linguistic, and logical definitions; (2) generate two benchmark datasets that can be used to evaluate the performance of neural information retrieval models and to fine-tune models for a more robust performance on negation; and (3) propose a logic-based classification mechanism that can be used to analyze the performance of retrieval models on existing datasets. Our taxonomy produces a balanced data distribution over negation types, providing a better training setup that leads to faster convergence on the NevIR dataset. Moreover, we propose a classification schema that reveals the coverage of negation types in existing datasets, offering insights into the factors that might affect the generalization of fine-tuned models on negation. Our code is publicly available on GitHub, and the datasets are available on HuggingFace.

2022

pdf bib
Towards Reproducible Machine Learning Research in Natural Language Processing
Ana Lucic | Maurits Bleeker | Samarth Bhargav | Jessica Forde | Koustuv Sinha | Jesse Dodge | Sasha Luccioni | Robert Stojnic
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

While recent progress in the field of ML has been significant, the reproducibility of these cutting-edge results is often lacking, with many submissions lacking the necessary information in order to ensure subsequent reproducibility. Despite proposals such as the Reproducibility Checklist and reproducibility criteria at several major conferences, the reflex for carrying out research with reproducibility in mind is lacking in the broader ML community. We propose this tutorial as a gentle introduction to ensuring reproducible research in ML, with a specific emphasis on computational linguistics and NLP. We also provide a framework for using reproducibility as a teaching tool in university-level computer science programs.

2021

pdf bib
Robustness Evaluation of Entity Disambiguation Using Prior Probes: the Case of Entity Overshadowing
Vera Provatorova | Samarth Bhargav | Svitlana Vakulenko | Evangelos Kanoulas
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Entity disambiguation (ED) is the last step of entity linking (EL), when candidate entities are reranked according to the context they appear in. All datasets for training and evaluating models for EL consist of convenience samples, such as news articles and tweets, that propagate the prior probability bias of the entity distribution towards more frequently occurring entities. It was shown that the performance of the EL systems on such datasets is overestimated since it is possible to obtain higher accuracy scores by merely learning the prior. To provide a more adequate evaluation benchmark, we introduce the ShadowLink dataset, which includes 16K short text snippets annotated with entity mentions. We evaluate and report the performance of popular EL systems on the ShadowLink benchmark. The results show a considerable difference in accuracy between more and less common entities for all of the EL systems under evaluation, demonstrating the effect of prior probability bias and entity overshadowing.