Hisada Shohei

Also published as: HISADA Shohei


2026

Recent advances in large language models (LLMs) have accelerated the NLP applications in the medical and clinical domains. However, evaluations remain limited for non-English languages, such as Japanese, where clinical corpora are particularly scarce. To address this gap, we present J-ClinicalBench, a publicly available benchmark designed to reflect realistic Japanese clinical tasks. We first created 227 expert-authored clinical documents and newly constructed five datasets for core clinical tasks. Building on these datasets, J-ClinicalBench comprises nine clinical tasks spanning clinical language reasoning, generation, and understanding. We establish baseline performance on J-ClinicalBench by evaluating state-of-the-art proprietary and Japanese open-source LLMs, providing the first assessment of their utility in practical clinical scenarios. By releasing this benchmark, we aim to foster the development and evaluation of clinically applicable LLMs in Japanese healthcare, bridging the current gap between clinical NLP research and clinical practice.

2025

In-hospital text data contains valuable clinical information, yet deploying fine-tuned small language models (SLMs) for information extraction remains challenging due to differences in formatting and vocabulary across institutions. Since access to the original in-hospital data (source domain) is often restricted, annotated data from the target hospital (target domain) is crucial for domain adaptation. However, clinical annotation is notoriously expensive and time-consuming, as it demands clinical and linguistic expertise. To address this issue, we leverage large language models (LLMs) to annotate the target domain data for the adaptation. We conduct experiments on four clinical information extraction tasks, including eight target domain data. Experimental results show that LLM-annotated data consistently enhances SLM performance and, with a larger number of annotated data, outperforms manual annotation in three out of four tasks.