Search Query Refinement for Japanese Named Entity Recognition in E-commerce Domain

Yuki Nakayama, Ryutaro Tatsushima, Erick Mendieta, Koji Murakami, Keiji Shinzato


Abstract
In the E-Commerce domain, search query refinement reformulates malformed queries into canonicalized forms by preprocessing operations such as “term splitting” and “term merging”. Unfortunately, most relevant research is rather limited to English. In particular, there is a severe lack of study on search query refinement for the Japanese language. Furthermore, no attempt has ever been made to apply refinement methods to data improvement for downstream NLP tasks in real-world scenarios.This paper presents a novel query refinement approach for the Japanese language. Experimental results show that our method achieves significant improvement by 3.5 points through comparison with BERT-CRF as a baseline. Further experiments are also conducted to measure beneficial impact of query refinement on named entity recognition (NER) as the downstream task. Evaluations indicate that the proposed query refinement method contributes to better data quality, leading to performance boost on E-Commerce specific NER tasks by 11.7 points, compared to search query data preprocessed by MeCab, a very popularly adopted Japanese tokenizer.
Anthology ID:
2024.naacl-industry.39
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Yi Yang, Aida Davani, Avi Sil, Anoop Kumar
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
447–452
Language:
URL:
https://aclanthology.org/2024.naacl-industry.39
DOI:
10.18653/v1/2024.naacl-industry.39
Bibkey:
Cite (ACL):
Yuki Nakayama, Ryutaro Tatsushima, Erick Mendieta, Koji Murakami, and Keiji Shinzato. 2024. Search Query Refinement for Japanese Named Entity Recognition in E-commerce Domain. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track), pages 447–452, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Search Query Refinement for Japanese Named Entity Recognition in E-commerce Domain (Nakayama et al., NAACL 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2024.naacl-industry.39.pdf