QueryNER: Segmentation of E-commerce Queries

Chester Palen-Michel, Lizzie Liang, Zhe Wu, Constantine Lignos


Abstract
We present QueryNER, a manually-annotated dataset and accompanying model for e-commerce query segmentation. Prior work in sequence labeling for e-commerce has largely addressed aspect-value extraction which focuses on extracting portions of a product title or query for narrowly defined aspects. Our work instead focuses on the goal of dividing a query into meaningful chunks with broadly applicable types. We report baseline tagging results and conduct experiments comparing token and entity dropping for null and low recall query recovery. Challenging test sets are created using automatic transformations and show how simple data augmentation techniques can make the models more robust to noise. We make the QueryNER dataset publicly available.
Anthology ID:
2024.lrec-main.1178
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
13455–13470
Language:
URL:
https://aclanthology.org/2024.lrec-main.1178
DOI:
Bibkey:
Cite (ACL):
Chester Palen-Michel, Lizzie Liang, Zhe Wu, and Constantine Lignos. 2024. QueryNER: Segmentation of E-commerce Queries. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 13455–13470, Torino, Italia. ELRA and ICCL.
Cite (Informal):
QueryNER: Segmentation of E-commerce Queries (Palen-Michel et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/2024.lrec-main.1178.pdf