JPPB: Automatic Construction of a Soft-Labeled Japanese Patient Phrase Bank for Symptom Normalization

Tomohiro Nishiyama, Mana Kuramoto, Shoko Wakamiya, Eiji ARAMAKI


Abstract
Patient-generated symptom expressions are linguistically diverse, often deviating from standardized medical terminology. This paper introduces the Japanese Patient Phrase Bank (JPPB), the first automatically constructed phrase-level normalization resource for Japanese patient language. JPPB introduces an embedding-based soft labeling framework that transforms traditional one-to-one dictionary mappings into graded and ambiguity-aware associations. This framework represents a shift from word-level to phrase-level normalization in Japanese. The resource covers 7,035 phrase–term pairs across 412 symptoms. Evaluation on the KEEPHA and MedNLP-SC datasets shows that soft labels consistently improve Top-1 accuracy and better approximate gold label distributions compared with hard labels. While LLM-based normalization achieved the highest scores, JPPB provides a lightweight and transparent alternative suitable for local deployment. This work demonstrates that large-scale, automatically generated phrase banks can achieve competitive performance relative to manually curated resources and serve as practical, scalable resources for medical natural language processing in Japanese.
Anthology ID:
2026.lrec-main.621
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
7816–7828
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.621/
DOI:
Bibkey:
Cite (ACL):
Tomohiro Nishiyama, Mana Kuramoto, Shoko Wakamiya, and Eiji ARAMAKI. 2026. JPPB: Automatic Construction of a Soft-Labeled Japanese Patient Phrase Bank for Symptom Normalization. International Conference on Language Resources and Evaluation, main:7816–7828.
Cite (Informal):
JPPB: Automatic Construction of a Soft-Labeled Japanese Patient Phrase Bank for Symptom Normalization (Nishiyama et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.621.pdf