Tiziano Labruna


2025

pdf bib
PCoT: Persuasion-Augmented Chain of Thought for Detecting Fake News and Social Media Disinformation
Arkadiusz Modzelewski | Witold Sosnowski | Tiziano Labruna | Adam Wierzbicki | Giovanni Da San Martino
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Disinformation detection is a key aspect of media literacy. Psychological studies have shown that knowledge of persuasive fallacies helps individuals detect disinformation. Inspired by these findings, we experimented with large language models (LLMs) to test whether infusing persuasion knowledge enhances disinformation detection. As a result, we introduce the Persuasion-Augmented Chain of Thought (PCoT), a novel approach that leverages persuasion to improve disinformation detection in zero-shot classification. We extensively evaluate PCoT on online news and social media posts. Moreover, we publish two novel, up-to-date disinformation datasets: EUDisinfo and MultiDis. These datasets enable the evaluation of PCoT on content entirely unseen by the LLMs used in our experiments, as the content was published after the models’ knowledge cutoffs. We show that, on average, PCoT outperforms competitive methods by 15% across five LLMs and five datasets. These findings highlight the value of persuasion in strengthening zero-shot disinformation detection.

pdf bib
Positional Bias in Binary Question Answering: How Uncertainty Shapes Model Preferences
Tiziano Labruna | Simone Gallo | Giovanni Da San Martino
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

pdf bib
Task-Oriented Dialogue Systems through Function Calling
Tiziano Labruna | Giovanni Bonetta | Bernardo Magnini
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

Large Language Models (LLMs) have demonstrated remarkable capabilities in generating dialogues and handling a broad range of user queries. However, their effectiveness as end-to-end Task-Oriented Dialogue (TOD) systems remains limited due to their reliance on static parametric memory, which fails to accommodate evolving knowledge bases (KBs). This paper investigates a scalable function-calling approach that enables LLMs to retrieve only the necessary KB entries via schema-guided queries, rather than embedding the entire KB into each prompt. This selective retrieval strategy reduces prompt size and inference time while improving factual accuracy in system responses. We evaluate our method on the MultiWOZ 2.3 dataset and compare it against a full-KB baseline that injects the entire KB into every prompt. Experimental results show that our approach consistently outperforms the full-KB method in accuracy, while requiring significantly fewer input tokens and considerably less computation time, especially when the KB size increases.

pdf bib
When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively
Tiziano Labruna | Jon Ander Campos | Gorka Azkune
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

In this paper, we demonstrate how Large Language Models (LLMs) can effectively learn to use an off-the-shelf information retrieval (IR) system specifically when additional context is required to answer a given question. Given the performance of IR systems, the optimal strategy for question answering does not always entail external information retrieval; rather, it often involves leveraging the parametric memory of the LLM itself. Prior research has identified this phenomenon in the PopQA dataset, wherein the most popular questions are effectively addressed using the LLM’s parametric memory, while less popular ones require IR system usage. Following this, we propose a tailored training approach for LLMs, leveraging existing open-domain question answering datasets. Here, LLMs are trained to generate a special token, <RET$>, when they do not know the answer to a question. Our evaluation of the Adaptive Retrieval LLM (Adapt-LLM) on the PopQA dataset showcases improvements over the same LLM under three configurations: (i) retrieving information for all the questions, (ii) using always the parametric memory of the LLM, and (iii) using a popularity threshold to decide when to use a retriever. Through our analysis, we demonstrate that Adapt-LLM is able to generate the <RET> token when it determines that it does not know how to answer a question, indicating the need for IR, while it achieves notably high accuracy levels when it chooses to rely only on its parametric memory.

2024

pdf bib
Are You a Good Assistant? Assessing LLM Trustability in Task-oriented Dialogues
Tiziano Labruna | Sofia Brenna | Giovanni Bonetta | Bernardo Magnini
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

Despite the impressive capabilities of recent Large Language Models (LLMs) to generate human-like text, their ability to produce contextually appropriate content for specific communicative situations is still a matter of debate. This issue is particularly crucial when LLMs are employed as assistants to help solve tasks or achieve goals within a given conversational domain. In such scenarios, the assistant is expected to access specific knowledge (e.g., a database of restaurants, a calendar of appointments) that is not directly accessible to the user and must be consistently utilised to accomplish the task.In this paper, we conduct experiments to evaluate the trustworthiness of automatic assistants in task-oriented dialogues. Our findings indicate that state-of-the-art open-source LLMs still face significant challenges in maintaining logical consistency with a knowledge base of facts, highlighting the need for further advancements in this area.

pdf bib
Dynamic Task-Oriented Dialogue: A Comparative Study of Llama-2 and Bert in Slot Value Generation
Tiziano Labruna | Sofia Brenna | Bernardo Magnini
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop

Recent advancements in instruction-based language models have demonstrated exceptional performance across various natural language processing tasks. We present a comprehensive analysis of the performance of two open-source language models, BERT and Llama-2, in the context of dynamic task-oriented dialogues. Focusing on the Restaurant domain and utilizing the MultiWOZ 2.4 dataset, our investigation centers on the models’ ability to generate predictions for masked slot values within text. The dynamic aspect is introduced through simulated domain changes, mirroring real-world scenarios where new slot values are incrementally added to a domain over time.This study contributes to the understanding of instruction-based models’ effectiveness in dynamic natural language understanding tasks when compared to traditional language models and emphasizes the significance of open-source, reproducible models in advancing research within the academic community.

pdf bib
Towards Cost-effective Multi-style Conversations: A Pilot Study in Task-oriented Dialogue Generation
Tiziano Labruna | Bernardo Magnini
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Conversations exhibit significant variation when different styles are employed by participants, often leading to subpar performance when a dialogue model is exclusively trained on single-style datasets. We present a cost-effective methodology for generating multi-style conversations, which can be used in the development of conversational agents. This methodology only assumes the availability of a conversational domain, such as a knowledge base, and leverages the generative capabilities of large language models. In a pilot study focused on the generation aspect of task-oriented dialogues, we extended the well-known MultiWOZ dataset to encompass multi-style variations. Our findings highlight two key experimental outcomes: (i) these novel resources pose challenges for current single-style models, and (ii) multi-style resources enhance the dialogue model’s resilience to stylistic variations.

2023

pdf bib
Testing ChatGPT for Stability and Reasoning: A Case Study Using Italian Medical Specialty Tests
Silvia Casola | Tiziano Labruna | Alberto Lavelli | Bernardo Magnini
Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)

pdf bib
Addressing Domain Changes in Task-oriented Conversational Agents through Dialogue Adaptation
Tiziano Labruna | Bernardo Magnini
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop

Recent task-oriented dialogue systems are trained on annotated dialogues, which, in turn, reflect certain domain information (e.g., restaurants or hotels in a given region). However, when such domain knowledge changes (e.g., new restaurants open), the initial dialogue model may become obsolete, decreasing the overall performance of the system. Through a number of experiments, we show, for instance, that adding 50% of new slot-values reduces of about 55% the dialogue state-tracker performance. In light of such evidence, we suggest that automatic adaptation of training dialogues is a valuable option for re-training obsolete models. We experimented with a dialogue adaptation approach based on fine-tuning a generative language model on domain changes, showing that a significant reduction of performance decrease can be obtained.

2021

pdf bib
From Cambridge to Pisa: A Journey into Cross-Lingual Dialogue Domain Adaptation for Conversational Agents
Tiziano Labruna | Bernardo Magnini
Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021)

pdf bib
Addressing Slot-Value Changes in Task-oriented Dialogue Systems through Dialogue Domain Adaptation
Tiziano Labruna | Bernardo Magnini
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Recent task-oriented dialogue systems learn a model from annotated dialogues, and such dialogues are in turn collected and annotated so that they are consistent with certain domain knowledge. However, in real scenarios, domain knowledge is subject to frequent changes, and initial training dialogues may soon become obsolete, resulting in a significant decrease in the model performance. In this paper, we investigate the relationship between training dialogues and domain knowledge, and propose Dialogue Domain Adaptation, a methodology aiming at adapting initial training dialogues to changes intervened in the domain knowledge. We focus on slot-value changes (e.g., when new slot values are available to describe domain entities) and define an experimental setting for dialogue domain adaptation. First, we show that current state-of-the-art models for dialogue state tracking are still poorly robust to slot-value changes of the domain knowledge. Then, we compare different domain adaptation strategies, showing that simple techniques are effective to reduce the gap between training dialogues and domain knowledge.