Thesis Proposal: LLMs post-training for multilingual medical tasks. Instruction-Tuning, Continual-Pretraining or Reasoning?

Pietro Ferrazzi; Alberto Lavelli; Bernardo Magnini

Thesis Proposal: LLMs post-training for multilingual medical tasks. Instruction-Tuning, Continual-Pretraining or Reasoning?

Pietro Ferrazzi, Alberto Lavelli, Bernardo Magnini

Abstract

Adapting Large Language Models to the medical domain remains an active area of research, with multiple strategies proposed to leverage annotated and unannotated data effectively. In this work, we propose a thesis outline to compare three common adaptation approaches—Instruction Tuning, Continual Pretraining, and Reasoning-oriented Training. We identify 5 dimensions to analyse: i) the interaction between the adaptation technique and the tasks; ii) the impact of the data size on the downstream performance; iii) the differences between datasets required by the three techniques; iv) the impact of the techniques given the model size; v) the impact of the techniques given the language.We construct an evaluation framework composed by 5 multilingual medical NLP tasks (named entity recognition, relation extraction, question answering, case report form filling, argument mining), spanning on 21 datasets in English, Italian, and Spanish, for a total of 61 combinations of language and sub-task.

Anthology ID:: 2026.acl-srw.9
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Santosh T.Y.S.S., Juan Diego Rodriguez, Ona de Gibert
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 110–122
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-srw.9/
DOI:
Bibkey:
Cite (ACL):: Pietro Ferrazzi, Alberto Lavelli, and Bernardo Magnini. 2026. Thesis Proposal: LLMs post-training for multilingual medical tasks. Instruction-Tuning, Continual-Pretraining or Reasoning?. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), pages 110–122, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Thesis Proposal: LLMs post-training for multilingual medical tasks. Instruction-Tuning, Continual-Pretraining or Reasoning? (Ferrazzi et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-srw.9.pdf

PDF Cite Search Fix data