Hideaki Hayashi


2025

pdf bib
Text Normalization for Japanese Sentiment Analysis
Risa Kondo | Ayu Teramen | Reon Kajikawa | Koki Horiguchi | Tomoyuki Kajiwara | Takashi Ninomiya | Hideaki Hayashi | Yuta Nakashima | Hajime Nagahara
Proceedings of the Tenth Workshop on Noisy and User-generated Text

We manually normalize noisy Japanese expressions on social networking services (SNS) to improve the performance of sentiment polarity classification.Despite advances in pre-trained language models, informal expressions found in social media still plague natural language processing.In this study, we analyzed 6,000 posts from a sentiment analysis corpus for Japanese SNS text, and constructed a text normalization taxonomy consisting of 33 types of editing operations.Text normalization according to our taxonomy significantly improved the performance of BERT-based sentiment analysis in Japanese.Detailed analysis reveals that most types of editing operations each contribute to improve the performance of sentiment analysis.