Yuki Nakayama

2024

pdf abs
Search Query Refinement for Japanese Named Entity Recognition in E-commerce Domain
Yuki Nakayama | Ryutaro Tatsushima | Erick Mendieta | Koji Murakami | Keiji Shinzato
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track)

In the E-Commerce domain, search query refinement reformulates malformed queries into canonicalized forms by preprocessing operations such as “term splitting” and “term merging”. Unfortunately, most relevant research is rather limited to English. In particular, there is a severe lack of study on search query refinement for the Japanese language. Furthermore, no attempt has ever been made to apply refinement methods to data improvement for downstream NLP tasks in real-world scenarios.This paper presents a novel query refinement approach for the Japanese language. Experimental results show that our method achieves significant improvement by 3.5 points through comparison with BERT-CRF as a baseline. Further experiments are also conducted to measure beneficial impact of query refinement on named entity recognition (NER) as the downstream task. Evaluations indicate that the proposed query refinement method contributes to better data quality, leading to performance boost on E-Commerce specific NER tasks by 11.7 points, compared to search query data preprocessed by MeCab, a very popularly adopted Japanese tokenizer.

2022

pdf abs
A Stacking-based Efficient Method for Toxic Language Detection on Live Streaming Chat
Yuto Oikawa | Yuki Nakayama | Koji Murakami
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track

In a live streaming chat on a video streaming service, it is crucial to filter out toxic comments with online processing to prevent users from reading comments in real-time. However, recent toxic language detection methods rely on deep learning methods, which can not be scalable considering inference speed. Also, these methods do not consider constraints of computational resources expected depending on a deployed system (e.g., no GPU resource).This paper presents an efficient method for toxic language detection that is aware of real-world scenarios. Our proposed architecture is based on partial stacking that feeds initial results with low confidence to meta-classifier. Experimental results show that our method achieves a much faster inference speed than BERT-based models with comparable performance.

pdf abs
A Large-Scale Japanese Dataset for Aspect-based Sentiment Analysis
Yuki Nakayama | Koji Murakami | Gautam Kumar | Sudha Bhingardive | Ikuko Hardaway
Proceedings of the Thirteenth Language Resources and Evaluation Conference

There has been significant progress in the field of sentiment analysis. However, aspect-based sentiment analysis (ABSA) has not been explored in the Japanese language even though it has a huge scope in many natural language processing applications such as 1) tracking sentiment towards products, movies, politicians etc; 2) improving customer relation models. The main reason behind this is that there is no standard Japanese dataset available for ABSA task. In this paper, we present the first standard Japanese dataset for the hotel reviews domain. The proposed dataset contains 53,192 review sentences with seven aspect categories and two polarity labels. We perform experiments on this dataset using popular ABSA approaches and report error analysis. Our experiments show that contextual models such as BERT works very well for the ABSA task in the Japanese language and also show the need to focus on other NLP tasks for better performance through our error analysis.

Yuki Nakayama

2024

2022

2015

2013

Co-authors

Venues