Prompting Large Language Models for Italian Clinical Reports: A Benchmark Study

Livia Lilli, Carlotta Masciocchi, Antonio Marchetti, Giovanni Arcuri, Stefano Patarnello


Abstract
Large Language Models (LLMs) have significantly impacted medical Natural Language Processing (NLP), enabling automated information extraction from unstructured clinical texts. However, selecting the most suitable approach requires careful evaluation of different model architectures, such as generative LLMs and BERT-based models, along with appropriate adaptation strategies, including prompting techniques, or fine-tuning. Several studies explored different LLM implementations, highlighting their effectiveness in medical domain, including complex diagnostics patterns as for example in rheumatology. However, their application to Italian remains limited, serving as a key example of the broader gap in non-English language research. In this study, we present a task-specific benchmark analysis comparing generative LLMs and BERT-based models, on real-world Italian clinical reports. We evaluated zero-shot prompting, in-context learning (ICL), and fine-tuning across eight diagnostic categories in the rheumatology area. Results show that ICL improves performance over zero-shot-prompting, particularly for Mixtral and Gemma models. Overall, BERT fine-tuning present the highest performance, while ICL outperforms BERT in specific diagnoses, such as renal and systemic, suggesting that prompting can be a potential alternative when labeled data is scarce.
Anthology ID:
2025.bionlp-1.17
Volume:
ACL 2025
Month:
August
Year:
2025
Address:
Viena, Austria
Editors:
Dina Demner-Fushman, Sophia Ananiadou, Makoto Miwa, Junichi Tsujii
Venues:
BioNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
190–200
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bionlp-1.17/
DOI:
Bibkey:
Cite (ACL):
Livia Lilli, Carlotta Masciocchi, Antonio Marchetti, Giovanni Arcuri, and Stefano Patarnello. 2025. Prompting Large Language Models for Italian Clinical Reports: A Benchmark Study. In ACL 2025, pages 190–200, Viena, Austria. Association for Computational Linguistics.
Cite (Informal):
Prompting Large Language Models for Italian Clinical Reports: A Benchmark Study (Lilli et al., BioNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bionlp-1.17.pdf
Supplementarymaterial:
 2025.bionlp-1.17.SupplementaryMaterial.txt
Supplementarymaterial:
 2025.bionlp-1.17.SupplementaryMaterial.zip