Alignment Precedes Fusion: Open-Vocabulary Named Entity Recognition as Context-Type Semantic Matching
Zhuoran Jin, Pengfei Cao, Zhitao He, Yubo Chen, Kang Liu, Jun Zhao
Abstract
Despite the significant progress in developing named entity recognition models, scaling to novel-emerging types still remains challenging in real-world scenarios. Continual learning and zero-shot learning approaches have been explored to handle novel-emerging types with less human supervision, but they have not been as successfully adopted as supervised approaches. Meanwhile, humans possess a much larger vocabulary size than these approaches and have the ability to learn the alignment between entities and concepts effortlessly through natural supervision. In this paper, we consider a more realistic and challenging setting called open-vocabulary named entity recognition (OVNER) to imitate human-level ability. OVNER aims to recognize entities in novel types by their textual names or descriptions. Specifically, we formulate OVNER as a semantic matching task and propose a novel and scalable two-stage method called Context-Type SemAntiC Alignment and FusiOn (CACAO). In the pre-training stage, we adopt Dual-Encoder for context-type semantic alignment and pre-train Dual-Encoder on 80M context-type pairs which are easily accessible through natural supervision. In the fine-tuning stage, we use Cross-Encoder for context-type semantic fusion and fine-tune Cross-Encoder on base types with human supervision. Experimental results show that our method outperforms the previous state-of-the-art methods on three challenging OVNER benchmarks by 9.7%, 9.5%, and 1.8% F1-score of novel types. Moreover, CACAO also demonstrates its flexible transfer ability in cross-domain NER.- Anthology ID:
- 2023.findings-emnlp.974
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2023
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 14616–14637
- Language:
- URL:
- https://aclanthology.org/2023.findings-emnlp.974
- DOI:
- 10.18653/v1/2023.findings-emnlp.974
- Cite (ACL):
- Zhuoran Jin, Pengfei Cao, Zhitao He, Yubo Chen, Kang Liu, and Jun Zhao. 2023. Alignment Precedes Fusion: Open-Vocabulary Named Entity Recognition as Context-Type Semantic Matching. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 14616–14637, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Alignment Precedes Fusion: Open-Vocabulary Named Entity Recognition as Context-Type Semantic Matching (Jin et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/2023.findings-emnlp.974.pdf