2025
pdf
bib
abs
MultiMSD: A Corpus for Multilingual Medical Text Simplification from Online Medical References
Koki Horiguchi
|
Tomoyuki Kajiwara
|
Takashi Ninomiya
|
Shoko Wakamiya
|
Eiji Aramaki
Findings of the Association for Computational Linguistics: ACL 2025
We release a parallel corpus for medical text simplification, which paraphrases medical terms into expressions easily understood by patients. Medical texts written by medical practitioners contain a lot of technical terms, and patients who are non-experts are often unable to use the information effectively. Therefore, there is a strong social demand for medical text simplification that paraphrases input sentences without using medical terms. However, this task has not been sufficiently studied in non-English languages. We therefore developed parallel corpora for medical text simplification in nine languages: German, English, Spanish, French, Italian, Japanese, Portuguese, Russian, and Chinese, each with 10,000 sentence pairs, by automatic sentence alignment to online medical references for professionals and consumers. We also propose a method for training text simplification models to actively paraphrase complex expressions, including medical terms. Experimental results show that the proposed method improves the performance of medical text simplification. In addition, we confirmed that training with a multilingual dataset is more effective than training with a monolingual dataset.
pdf
bib
abs
Text Normalization for Japanese Sentiment Analysis
Risa Kondo
|
Ayu Teramen
|
Reon Kajikawa
|
Koki Horiguchi
|
Tomoyuki Kajiwara
|
Takashi Ninomiya
|
Hideaki Hayashi
|
Yuta Nakashima
|
Hajime Nagahara
Proceedings of the Tenth Workshop on Noisy and User-generated Text
We manually normalize noisy Japanese expressions on social networking services (SNS) to improve the performance of sentiment polarity classification.Despite advances in pre-trained language models, informal expressions found in social media still plague natural language processing.In this study, we analyzed 6,000 posts from a sentiment analysis corpus for Japanese SNS text, and constructed a text normalization taxonomy consisting of 33 types of editing operations.Text normalization according to our taxonomy significantly improved the performance of BERT-based sentiment analysis in Japanese.Detailed analysis reveals that most types of editing operations each contribute to improve the performance of sentiment analysis.
2024
pdf
bib
abs
Evaluation Dataset for Japanese Medical Text Simplification
Koki Horiguchi
|
Tomoyuki Kajiwara
|
Yuki Arase
|
Takashi Ninomiya
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)
We create a parallel corpus for medical text simplification in Japanese, which simplifies medical terms into expressions that patients can understand without effort.While text simplification in the medial domain is strongly desired by society, it is less explored in Japanese because of the lack of language resources.In this study, we build a parallel corpus for Japanese text simplification evaluation in the medical domain using patients’ weblogs.This corpus consists of 1,425 pairs of complex and simple sentences with or without medical terms.To tackle medical text simplification without a training corpus of the corresponding domain, we repurpose a Japanese text simplification model of other domains.Furthermore, we propose a lexically constrained reranking method that allows to avoid technical terms to be output.Experimental results show that our method contributes to achieving higher simplification performance in the medical domain.