Nicola Tonellotto
2026
A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems
Florin Cuconasu | Giovanni Trappolini | Nicola Tonellotto | Fabrizio Silvestri
Proceedings of the 1st Workshop on Multilingual Report Generation via Retrieval Augmented Generation (RAG4Reports 2026)
Florin Cuconasu | Giovanni Trappolini | Nicola Tonellotto | Fabrizio Silvestri
Proceedings of the 1st Workshop on Multilingual Report Generation via Retrieval Augmented Generation (RAG4Reports 2026)
Retrieval-Augmented Generation (RAG) represents a significant advancement in artificial intelligence combining a retrieval phase with a generative phase, with the latter typically being powered by Large Language Models (LLMs). Common wisdom and practices in RAG involve using "instructed" LLMs, which are fine-tuned with supervised training to enhance their ability to follow instructions and are aligned with human preferences using state-of-the-art techniques.However, contrary to this popular belief, our study demonstrates that base models outperform their instructed counterparts in RAG tasks by 20% on average under our experimental settings. This finding challenges the prevailing assumptions about the superiority of instructed LLMs in RAG applications. Further investigations reveal a more complex situation, questioning fundamental aspects of RAG and suggesting the need for broader discussions on the topic; or, as Fromm would have it, "Seldom is a glance at the statistics enough to understand the meaning of the figures".
The Mechanics of Interference: Defusing Distractors in RAG via Sparse Autoencoder Interventions
Christian Giannetti | Giovanni Trappolini | Nicola Tonellotto | Fabrizio Silvestri | Pietro Lio
Findings of the Association for Computational Linguistics: ACL 2026
Christian Giannetti | Giovanni Trappolini | Nicola Tonellotto | Fabrizio Silvestri | Pietro Lio
Findings of the Association for Computational Linguistics: ACL 2026
Large language models exhibit a critical vulnerability to distractor interference in retrieval-augmented contexts: they fail to prioritize relevant, factually correct documents over topically similar but misleading content. We introduce Lat-Defuse, a mechanistic framework that corrects this failure mode through targeted interventions in the model’s latent space. Using Sparse Autoencoders (SAEs), our method operates in an interpretable feature space and formulates correction as constrained counterfactual optimization. On Gemma-2 and Llama-3 model families across three QA benchmarks (BioASQ, Natural Questions, PopQA), our method achieves recovery rates of up to 94% on distractor-vulnerable samples. Successful correction through sparse modifications reveals distractor interference as a localized, systematically addressable phenomenon, opening directions toward universal distractor robustness in LLMs.
Statistical Foundations of DIME: Risk Estimation for Practical Index Selection
Giulio D'Erasmo | Cesare Campagnano | Antonio Mallia | Pierpaolo Brutti | Nicola Tonellotto | Fabrizio Silvestri
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
Giulio D'Erasmo | Cesare Campagnano | Antonio Mallia | Pierpaolo Brutti | Nicola Tonellotto | Fabrizio Silvestri
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
High-dimensional dense embeddings have become central to modern Information Retrieval, but many dimensions are noisy or redundant. Recently proposed DIME (Dimension IMportance Estimation), provides query-dependent scores to identify informative components of embeddings. DIME relies on a costly grid search to select a priori a dimensionality for all the query corpus’s embeddings. Our work provides a statistically grounded criterion that directly identifies the optimal set of dimensions for each query at inference time. Experiments confirm that this approach improves retrieval effectiveness and reduces embedding size by an average 50% of across different models and datasets at inference time.