Pietro Ferrazzi
2026
Thesis Proposal: LLMs post-training for multilingual medical tasks. Instruction-Tuning, Continual-Pretraining or Reasoning?
Pietro Ferrazzi | Alberto Lavelli | Bernardo Magnini
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Pietro Ferrazzi | Alberto Lavelli | Bernardo Magnini
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Adapting Large Language Models to the medical domain remains an active area of research, with multiple strategies proposed to leverage annotated and unannotated data effectively. In this work, we propose a thesis outline to compare three common adaptation approaches—Instruction Tuning, Continual Pretraining, and Reasoning-oriented Training. We identify 5 dimensions to analyse: i) the interaction between the adaptation technique and the tasks; ii) the impact of the data size on the downstream performance; iii) the differences between datasets required by the three techniques; iv) the impact of the techniques given the model size; v) the impact of the techniques given the language.We construct an evaluation framework composed by 5 multilingual medical NLP tasks (named entity recognition, relation extraction, question answering, case report form filling, argument mining), spanning on 21 datasets in English, Italian, and Spanish, for a total of 61 combinations of language and sub-task.
Is Agentic RAG worth it? An experimental comparison of RAG approaches
Pietro Ferrazzi | Milica Cvjetićanin | Alessio Piraccini | Davide Giannuzzi
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Pietro Ferrazzi | Milica Cvjetićanin | Alessio Piraccini | Davide Giannuzzi
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Retrieval-Augmented Generation (RAG) systems are usually defined by the combination of a generator and a retrieval component that extracts textual context from a knowledge base to answer user queries. However, such basic implementations exhibit several limitations, including noisy or suboptimal retrieval, misuse of retrieval for out-of-scope queries, weak query–document matching, and variability or cost associated with the generator. These shortcomings have motivated the development of "Enhanced" RAG, where dedicated modules are introduced to address specific weaknesses in the workflow.More recently, the growing self-reflective capabilities of Large Language Models (LLMs) have enabled a new paradigm, often referred to as "Agentic" RAG. In this approach, an LLM orchestrates the entire process, deciding which actions to perform, when to perform them, and whether to iterate. Despite the rapid adoption of both paradigms, it remains unclear which approach is preferable under which conditions.In this work, we conduct an empirically driven evaluation of "Enhanced" and "Agentic" RAG across multiple scenarios and dimensions. Our results provide practical insights into the trade-offs between the two paradigms, offering guidance on selecting the most effective RAG design for real-world applications, considering both performance and costs.
2025
Converting Annotated Clinical Cases into Structured Case Report Forms
Pietro Ferrazzi | Alberto Lavelli | Bernardo Magnini
Proceedings of the 24th Workshop on Biomedical Language Processing
Pietro Ferrazzi | Alberto Lavelli | Bernardo Magnini
Proceedings of the 24th Workshop on Biomedical Language Processing
Case Report Forms (CRFs) are largely used in medical research as they ensure accuracy, reliability, and validity of results in clinical studies. However, publicly available, well-annotated CRF datasets are scarce, limiting the development of CRF slot filling systems able to fill in a CRF from clinical notes. To mitigate the scarcity of CRF datasets, we propose to take advantage of available datasets annotated for information extraction tasks and to convert them into structured CRFs. We present a semi-automatic conversion methodology, which has been applied to the E3C dataset in two languages (English and Italian), resulting in a new, high-quality dataset for CRF slot filling. Through several experiments on the created dataset, we report that slot filling achieves 59.7% for Italian and 67.3% for English on a closed Large Language Models (zero-shot) and worse performances on three families of open-source models, showing that filling CRFs is challenging even for recent state-of-the-art LLMs.