Hao Feng
2022
Knowledge Distillation based Contextual Relevance Matching for E-commerce Product Search
Ziyang Liu
|
Chaokun Wang
|
Hao Feng
|
Lingfei Wu
|
Liqun Yang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track
Online relevance matching is an essential task of e-commerce product search to boost the utility of search engines and ensure a smooth user experience. Previous work adopts either classical relevance matching models or Transformer-style models to address it. However, they ignore the inherent bipartite graph structures that are ubiquitous in e-commerce product search logs and are too inefficient to deploy online. In this paper, we design an efficient knowledge distillation framework for e-commerce relevance matching to integrate the respective advantages of Transformer-style models and classical relevance matching models. Especially for the core student model of the framework, we propose a novel method using k-order relevance modeling. The experimental results on large-scale real-world data (the size is 6 174 million) show that the proposed method significantly improves the prediction accuracy in terms of human relevance judgment. We deploy our method to JD.com online search platform. The A/B testing results show that our method significantly improves most business metrics under price sort mode and default sort mode.
TWEET-FID: An Annotated Dataset for Multiple Foodborne Illness Detection Tasks
Ruofan Hu
|
Dongyu Zhang
|
Dandan Tao
|
Thomas Hartvigsen
|
Hao Feng
|
Elke Rundensteiner
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Foodborne illness is a serious but preventable public health problem – with delays in detecting the associated outbreaks resulting in productivity loss, expensive recalls, public safety hazards, and even loss of life. While social media is a promising source for identifying unreported foodborne illnesses, there is a dearth of labeled datasets for developing effective outbreak detection models. To accelerate the development of machine learning-based models for foodborne outbreak detection, we thus present TWEET-FID (TWEET-Foodborne Illness Detection), the first publicly available annotated dataset for multiple foodborne illness incident detection tasks. TWEET-FID collected from Twitter is annotated with three facets: tweet class, entity type, and slot type, with labels produced by experts as well as by crowdsource workers. We introduce several domain tasks leveraging these three facets: text relevance classification (TRC), entity mention detection (EMD), and slot filling (SF). We describe the end-to-end methodology for dataset design, creation, and labeling for supporting model development for these tasks. A comprehensive set of results for these tasks leveraging state-of-the-art single-and multi-task deep learning methods on the TWEET-FID dataset are provided. This dataset opens opportunities for future research in foodborne outbreak detection.
Search
Co-authors
- Ziyang Liu 1
- Chaokun Wang 1
- Lingfei Wu 1
- Liqun Yang 1
- Ruofan Hu 1
- show all...