Osman Alperen Koraş


2025

pdf bib
Towards Conditioning Clinical Text Generation for User Control
Osman Alperen Koraş | Rabi Bahnan | Jens Kleesiek | Amin Dada
Findings of the Association for Computational Linguistics: ACL 2025

Deploying natural language generation systems in clinical settings remains challenging despite advances in Large Language Models (LLMs), which continue to exhibit hallucinations and factual inconsistencies, necessitating human oversight. This paper explores automated dataset augmentation using LLMs as human proxies to condition LLMs for clinician control without increasing cognitive workload. On the BioNLP ACL’24 Discharge Me! Shared Task, we achieve new state-of-the-art results with simpler methods than prior submissions through more efficient training, yielding a 9% relative improvement without augmented training and up to 34% with dataset augmentation. Preliminary human evaluation further supports the effectiveness of our approach, highlighting the potential of augmenting clinical text generation for control to enhance relevance, accuracy, and factual consistency.

pdf bib
Does Biomedical Training Lead to Better Medical Performance?
Amin Dada | Osman Alperen Koraş | Marie Bauer | Jean-Philippe Corbeil | Amanda Butler Contreras | Constantin Marc Seibold | Kaleb E Smith | Julian.friedrich@uk-essen.de Julian.friedrich@uk-essen.de | Jens Kleesiek
Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)

Large Language Models (LLMs) hold significant potential for improving healthcare applications, with biomedically adapted models promising enhanced performance on medical tasks. However, the effectiveness of biomedical domain adaptation for clinical tasks remains uncertain. In this study, we conduct a direct comparison of 12 biomedically adapted models and their general-domain base counterparts across six clinical tasks. Our results reveal that 11 out of 12 biomedical models exhibit performance declines, challenging prior findings that reported positive effects of biomedical adaptation. Notably, previous positive results primarily relied on multiple-choice evaluations, which may not reflect performance in real-world clinical applications. To promote reproducibility and further research, we open-source our evaluation pipeline, providing a resource for the development of models with practical benefits in healthcare settings.