Coarse-to-Fine Pre-training for Named Entity Recognition
Xue Mengge, Bowen Yu, Zhenyu Zhang, Tingwen Liu, Yue Zhang, Bin Wang
Abstract
More recently, Named Entity Recognition hasachieved great advances aided by pre-trainingapproaches such as BERT. However, currentpre-training techniques focus on building lan-guage modeling objectives to learn a gen-eral representation, ignoring the named entity-related knowledge. To this end, we proposea NER-specific pre-training framework to in-ject coarse-to-fine automatically mined entityknowledge into pre-trained models. Specifi-cally, we first warm-up the model via an en-tity span identification task by training it withWikipedia anchors, which can be deemed asgeneral-typed entities. Then we leverage thegazetteer-based distant supervision strategy totrain the model extract coarse-grained typedentities. Finally, we devise a self-supervisedauxiliary task to mine the fine-grained namedentity knowledge via clustering.Empiricalstudies on three public NER datasets demon-strate that our framework achieves significantimprovements against several pre-trained base-lines, establishing the new state-of-the-art per-formance on three benchmarks. Besides, weshow that our framework gains promising re-sults without using human-labeled trainingdata, demonstrating its effectiveness in label-few and low-resource scenarios.- Anthology ID:
- 2020.emnlp-main.514
- Volume:
- Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Editors:
- Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 6345–6354
- Language:
- URL:
- https://aclanthology.org/2020.emnlp-main.514
- DOI:
- 10.18653/v1/2020.emnlp-main.514
- Cite (ACL):
- Xue Mengge, Bowen Yu, Zhenyu Zhang, Tingwen Liu, Yue Zhang, and Bin Wang. 2020. Coarse-to-Fine Pre-training for Named Entity Recognition. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6345–6354, Online. Association for Computational Linguistics.
- Cite (Informal):
- Coarse-to-Fine Pre-training for Named Entity Recognition (Mengge et al., EMNLP 2020)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2020.emnlp-main.514.pdf
- Code
- strawberryx/CoFEE