Alignment Precedes Fusion: Open-Vocabulary Named Entity Recognition as Context-Type Semantic Matching

Zhuoran Jin, Pengfei Cao, Zhitao He, Yubo Chen, Kang Liu, Jun Zhao


Abstract
Despite the significant progress in developing named entity recognition models, scaling to novel-emerging types still remains challenging in real-world scenarios. Continual learning and zero-shot learning approaches have been explored to handle novel-emerging types with less human supervision, but they have not been as successfully adopted as supervised approaches. Meanwhile, humans possess a much larger vocabulary size than these approaches and have the ability to learn the alignment between entities and concepts effortlessly through natural supervision. In this paper, we consider a more realistic and challenging setting called open-vocabulary named entity recognition (OVNER) to imitate human-level ability. OVNER aims to recognize entities in novel types by their textual names or descriptions. Specifically, we formulate OVNER as a semantic matching task and propose a novel and scalable two-stage method called Context-Type SemAntiC Alignment and FusiOn (CACAO). In the pre-training stage, we adopt Dual-Encoder for context-type semantic alignment and pre-train Dual-Encoder on 80M context-type pairs which are easily accessible through natural supervision. In the fine-tuning stage, we use Cross-Encoder for context-type semantic fusion and fine-tune Cross-Encoder on base types with human supervision. Experimental results show that our method outperforms the previous state-of-the-art methods on three challenging OVNER benchmarks by 9.7%, 9.5%, and 1.8% F1-score of novel types. Moreover, CACAO also demonstrates its flexible transfer ability in cross-domain NER.
Anthology ID:
2023.findings-emnlp.974
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14616–14637
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.974
DOI:
10.18653/v1/2023.findings-emnlp.974
Bibkey:
Cite (ACL):
Zhuoran Jin, Pengfei Cao, Zhitao He, Yubo Chen, Kang Liu, and Jun Zhao. 2023. Alignment Precedes Fusion: Open-Vocabulary Named Entity Recognition as Context-Type Semantic Matching. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 14616–14637, Singapore. Association for Computational Linguistics.
Cite (Informal):
Alignment Precedes Fusion: Open-Vocabulary Named Entity Recognition as Context-Type Semantic Matching (Jin et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-2024-clasp/2023.findings-emnlp.974.pdf