Hajime Nagahara


A Japanese Dataset for Subjective and Objective Sentiment Polarity Classification in Micro Blog Domain
Haruya Suzuki | Yuto Miyauchi | Kazuki Akiyama | Tomoyuki Kajiwara | Takashi Ninomiya | Noriko Takemura | Yuta Nakashima | Hajime Nagahara
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We annotate 35,000 SNS posts with both the writer’s subjective sentiment polarity labels and the reader’s objective ones to construct a Japanese sentiment analysis dataset. Our dataset includes intensity labels (none, weak, medium, and strong) for each of the eight basic emotions by Plutchik (joy, sadness, anticipation, surprise, anger, fear, disgust, and trust) as well as sentiment polarity labels (strong positive, positive, neutral, negative, and strong negative). Previous studies on emotion analysis have studied the analysis of basic emotions and sentiment polarity independently. In other words, there are few corpora that are annotated with both basic emotions and sentiment polarity. Our dataset is the first large-scale corpus to annotate both of these emotion labels, and from both the writer’s and reader’s perspectives. In this paper, we analyze the relationship between basic emotion intensity and sentiment polarity on our dataset and report the results of benchmarking sentiment polarity classification.

pdf bib
Emotional Intensity Estimation based on Writer’s Personality
Haruya Suzuki | Sora Tarumoto | Tomoyuki Kajiwara | Takashi Ninomiya | Yuta Nakashima | Hajime Nagahara
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: Student Research Workshop

We propose a method for personalized emotional intensity estimation based on a writer’s personality test for Japanese SNS posts. Existing emotion analysis models are difficult to accurately estimate the writer’s subjective emotions behind the text. We personalize the emotion analysis using not only the text but also the writer’s personality information. Experimental results show that personality information improves the performance of emotional intensity estimation. Furthermore, a hybrid model combining the existing personalized method with ours achieved state-of-the-art performance.


WRIME: A New Dataset for Emotional Intensity Estimation with Subjective and Objective Annotations
Tomoyuki Kajiwara | Chenhui Chu | Noriko Takemura | Yuta Nakashima | Hajime Nagahara
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

We annotate 17,000 SNS posts with both the writer’s subjective emotional intensity and the reader’s objective one to construct a Japanese emotion analysis dataset. In this study, we explore the difference between the emotional intensity of the writer and that of the readers with this dataset. We found that the reader cannot fully detect the emotions of the writer, especially anger and trust. In addition, experimental results in estimating the emotional intensity show that it is more difficult to estimate the writer’s subjective labels than the readers’. The large gap between the subjective and objective emotions imply the complexity of the mapping from a post to the subjective emotion intensities, which also leads to a lower performance with machine learning models.


IDSOU at WNUT-2020 Task 2: Identification of Informative COVID-19 English Tweets
Sora Ohashi | Tomoyuki Kajiwara | Chenhui Chu | Noriko Takemura | Yuta Nakashima | Hajime Nagahara
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

We introduce the IDSOU submission for the WNUT-2020 task 2: identification of informative COVID-19 English Tweets. Our system is an ensemble of pre-trained language models such as BERT. We ranked 16th in the F1 score.

Constructing a Public Meeting Corpus
Koji Tanaka | Chenhui Chu | Haolin Ren | Benjamin Renoust | Yuta Nakashima | Noriko Takemura | Hajime Nagahara | Takao Fujikawa
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper, we propose a full pipeline of analysis of a large corpus about a century of public meeting in historical Australian news papers, from construction to visual exploration. The corpus construction method is based on image processing and OCR. We digitize and transcribe texts of the specific topic of public meeting. Experiments show that our proposed method achieves a F-score of 87.8% for corpus construction. As a result, we built a content search tool for temporal and semantic content analysis.