Mohit Tuteja


2026

Retrieval-Augmented Generation (RAG) systems depend critically on retrieval quality to enable accurate, contextually relevant LLM responses. While LLMs excel at synthesis, their RAG performance is bottlenecked by document relevance. We evaluate advanced retrieval techniques including embedding model comparison, Reciprocal Rank Fusion (RRF), embedding concatenation and list-wise and adaptive LLM-based re-ranking, demonstrating that zero-shot LLMs outperform traditional cross-encoders in identifying high-relevance passages. We also explore context-aware embeddings, diverse chunking strategies, and model fine-tuning. All methods are rigorously evaluated on a proprietary dataset powering our deployed production chatbot, with validation on three public benchmarks: FiQA, HotpotQA, and SciDocs. Results show consistent gains in Recall@10, closing the gap with Recall@50 and yielding actionable pipeline recommendations. By prioritizing retrieval enhancements, we significantly elevate downstream LLM response quality in real-world, customer-facing applications.

2023

In the legal domain, we often perform classification tasks on very long documents, for example court judgements. These documents often contain thousands of words, so the length of these documents poses a challenge for this modelling task. In this research paper, we present a comprehensive evaluation of various strategies to perform long text classification using Transformers in conjunction with strategies to select document chunks using traditional NLP models. We conduct our experiments on 6 benchmark datasets comprising lengthy documents, 4 of which are publicly available. Each dataset has a median word count exceeding 1,000. Our evaluation encompasses state-of-the-art Transformer models, such as RoBERTa, Longformer, HAT, MEGA and LegalBERT and compares them with a traditional baseline TF-IDF + Neural Network (NN) model. We investigate the effectiveness of pre-training on large corpora, fine tuning strategies, and transfer learning techniques in the context of long text classification.