Pietro Lio
2026
The Mechanics of Interference: Defusing Distractors in RAG via Sparse Autoencoder Interventions
Christian Giannetti | Giovanni Trappolini | Nicola Tonellotto | Fabrizio Silvestri | Pietro Lio
Findings of the Association for Computational Linguistics: ACL 2026
Christian Giannetti | Giovanni Trappolini | Nicola Tonellotto | Fabrizio Silvestri | Pietro Lio
Findings of the Association for Computational Linguistics: ACL 2026
Large language models exhibit a critical vulnerability to distractor interference in retrieval-augmented contexts: they fail to prioritize relevant, factually correct documents over topically similar but misleading content. We introduce Lat-Defuse, a mechanistic framework that corrects this failure mode through targeted interventions in the model’s latent space. Using Sparse Autoencoders (SAEs), our method operates in an interpretable feature space and formulates correction as constrained counterfactual optimization. On Gemma-2 and Llama-3 model families across three QA benchmarks (BioASQ, Natural Questions, PopQA), our method achieves recovery rates of up to 94% on distractor-vulnerable samples. Successful correction through sparse modifications reveals distractor interference as a localized, systematically addressable phenomenon, opening directions toward universal distractor robustness in LLMs.
2024
HyperBERT: Mixing Hypergraph-Aware Layers with Language Models for Node Classification on Text-Attributed Hypergraphs
Adrián Bazaga | Pietro Lio | Gos Micklem
Findings of the Association for Computational Linguistics: EMNLP 2024
Adrián Bazaga | Pietro Lio | Gos Micklem
Findings of the Association for Computational Linguistics: EMNLP 2024
Hypergraphs are characterized by complex topological structure, representing higher-order interactions among multiple entities through hyperedges. Lately, hypergraph-based deep learning methods to learn informative data representations for the problem of node classification on text-attributed hypergraphs have garnered increasing research attention. However, existing methods struggle to simultaneously capture the full extent of hypergraph structural information and the rich linguistic attributes inherent in the nodes attributes, which largely hampers their effectiveness and generalizability. To overcome these challenges, we explore ways to further augment a pretrained BERT model with specialized hypergraph-aware layers for the task of node classification. Such layers introduce higher-order structural inductive bias into the language model, thus improving the model’s capacity to harness both higher-order context information from the hypergraph structure and semantic information present in text. In this paper, we propose a new architecture, HyperBERT, a mixed text-hypergraph model which simultaneously models hypergraph relational structure while maintaining the high-quality text encoding capabilities of a pre-trained BERT. Notably, HyperBERT presents results that achieve a new state-of-the-art on five challenging text-attributed hypergraph node classification benchmarks.
2022
Extending Logic Explained Networks to Text Classification
Rishabh Jain | Gabriele Ciravegna | Pietro Barbiero | Francesco Giannini | Davide Buffelli | Pietro Lio
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Rishabh Jain | Gabriele Ciravegna | Pietro Barbiero | Francesco Giannini | Davide Buffelli | Pietro Lio
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Recently, Logic Explained Networks (LENs) have been proposed as explainable-by-design neural models providing logic explanations for their predictions.However, these models have only been applied to vision and tabular data, and they mostly favour the generation of global explanations, while local ones tend to be noisy and verbose.For these reasons, we propose LEN<sup>p</sup>, improving local explanations by perturbing input words, and we test it on text classification. Our results show that (i) LEN<sup>p</sup> provides better local explanations than LIME in terms of sensitivity and faithfulness, and (ii) its logic explanations are more useful and user-friendly than the feature scoring provided by LIME as attested by a human survey.