Prompting Large Language Models for Italian Clinical Reports: A Benchmark Study
Livia Lilli, Carlotta Masciocchi, Antonio Marchetti, Giovanni Arcuri, Stefano Patarnello
Abstract
Large Language Models (LLMs) have significantly impacted medical Natural Language Processing (NLP), enabling automated information extraction from unstructured clinical texts. However, selecting the most suitable approach requires careful evaluation of different model architectures, such as generative LLMs and BERT-based models, along with appropriate adaptation strategies, including prompting techniques, or fine-tuning. Several studies explored different LLM implementations, highlighting their effectiveness in medical domain, including complex diagnostics patterns as for example in rheumatology. However, their application to Italian remains limited, serving as a key example of the broader gap in non-English language research. In this study, we present a task-specific benchmark analysis comparing generative LLMs and BERT-based models, on real-world Italian clinical reports. We evaluated zero-shot prompting, in-context learning (ICL), and fine-tuning across eight diagnostic categories in the rheumatology area. Results show that ICL improves performance over zero-shot-prompting, particularly for Mixtral and Gemma models. Overall, BERT fine-tuning present the highest performance, while ICL outperforms BERT in specific diagnoses, such as renal and systemic, suggesting that prompting can be a potential alternative when labeled data is scarce.- Anthology ID:
- 2025.bionlp-1.17
- Volume:
- ACL 2025
- Month:
- August
- Year:
- 2025
- Address:
- Viena, Austria
- Editors:
- Dina Demner-Fushman, Sophia Ananiadou, Makoto Miwa, Junichi Tsujii
- Venues:
- BioNLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 190–200
- Language:
- URL:
- https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bionlp-1.17/
- DOI:
- Cite (ACL):
- Livia Lilli, Carlotta Masciocchi, Antonio Marchetti, Giovanni Arcuri, and Stefano Patarnello. 2025. Prompting Large Language Models for Italian Clinical Reports: A Benchmark Study. In ACL 2025, pages 190–200, Viena, Austria. Association for Computational Linguistics.
- Cite (Informal):
- Prompting Large Language Models for Italian Clinical Reports: A Benchmark Study (Lilli et al., BioNLP 2025)
- PDF:
- https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bionlp-1.17.pdf