Text Normalization for Japanese Sentiment Analysis
Risa Kondo, Ayu Teramen, Reon Kajikawa, Koki Horiguchi, Tomoyuki Kajiwara, Takashi Ninomiya, Hideaki Hayashi, Yuta Nakashima, Hajime Nagahara
Abstract
We manually normalize noisy Japanese expressions on social networking services (SNS) to improve the performance of sentiment polarity classification.Despite advances in pre-trained language models, informal expressions found in social media still plague natural language processing.In this study, we analyzed 6,000 posts from a sentiment analysis corpus for Japanese SNS text, and constructed a text normalization taxonomy consisting of 33 types of editing operations.Text normalization according to our taxonomy significantly improved the performance of BERT-based sentiment analysis in Japanese.Detailed analysis reveals that most types of editing operations each contribute to improve the performance of sentiment analysis.- Anthology ID:
- 2025.wnut-1.16
- Volume:
- Proceedings of the Tenth Workshop on Noisy and User-generated Text
- Month:
- May
- Year:
- 2025
- Address:
- Albuquerque, New Mexico, USA
- Editors:
- JinYeong Bak, Rob van der Goot, Hyeju Jang, Weerayut Buaphet, Alan Ramponi, Wei Xu, Alan Ritter
- Venues:
- WNUT | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 149–157
- Language:
- URL:
- https://preview.aclanthology.org/fix-sig-urls/2025.wnut-1.16/
- DOI:
- Cite (ACL):
- Risa Kondo, Ayu Teramen, Reon Kajikawa, Koki Horiguchi, Tomoyuki Kajiwara, Takashi Ninomiya, Hideaki Hayashi, Yuta Nakashima, and Hajime Nagahara. 2025. Text Normalization for Japanese Sentiment Analysis. In Proceedings of the Tenth Workshop on Noisy and User-generated Text, pages 149–157, Albuquerque, New Mexico, USA. Association for Computational Linguistics.
- Cite (Informal):
- Text Normalization for Japanese Sentiment Analysis (Kondo et al., WNUT 2025)
- PDF:
- https://preview.aclanthology.org/fix-sig-urls/2025.wnut-1.16.pdf