Zijun Yao
2021
Interpretable and Low-Resource Entity Matching via Decoupling Feature Learning from Decision Making
Zijun Yao
|
Chengjiang Li
|
Tiansi Dong
|
Xin Lv
|
Jifan Yu
|
Lei Hou
|
Juanzi Li
|
Yichi Zhang
|
Zelin Dai
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Entity Matching (EM) aims at recognizing entity records that denote the same real-world object. Neural EM models learn vector representation of entity descriptions and match entities end-to-end. Though robust, these methods require many annotated resources for training, and lack of interpretability. In this paper, we propose a novel EM framework that consists of Heterogeneous Information Fusion (HIF) and Key Attribute Tree (KAT) Induction to decouple feature representation from matching decision. Using self-supervised learning and mask mechanism in pre-trained language modeling, HIF learns the embeddings of noisy attribute values by inter-attribute attention with unlabeled data. Using a set of comparison features and a limited amount of annotated data, KAT Induction learns an efficient decision tree that can be interpreted by generating entity matching rules whose structure is advocated by domain experts. Experiments on 6 public datasets and 3 industrial datasets show that our method is highly efficient and outperforms SOTA EM models in most cases. We will release the codes upon acceptance.
Search
Co-authors
- Chengjiang Li 1
- Tiansi Dong 1
- Xin Lv 1
- Jifan Yu 1
- Lei Hou 1
- show all...
Venues
- ACL1