Abstract
In this paper, we study the named entity recognition (NER) problem under distant supervision. Due to the incompleteness of the external dictionaries and/or knowledge bases, such distantly annotated training data usually suffer from a high false negative rate. To this end, we formulate the Distantly Supervised NER (DS-NER) problem via Multi-class Positive and Unlabeled (MPU) learning and propose a theoretically and practically novel CONFidence-based MPU (Conf-MPU) approach. To handle the incomplete annotations, Conf-MPU consists of two steps. First, a confidence score is estimated for each token of being an entity token. Then, the proposed Conf-MPU risk estimation is applied to train a multi-class classifier for the NER task. Thorough experiments on two benchmark datasets labeled by various external knowledge demonstrate the superiority of the proposed Conf-MPU over existing DS-NER methods. Our code is available at Github.- Anthology ID:
- 2022.acl-long.498
- Volume:
- Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- May
- Year:
- 2022
- Address:
- Dublin, Ireland
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 7198–7211
- Language:
- URL:
- https://aclanthology.org/2022.acl-long.498
- DOI:
- 10.18653/v1/2022.acl-long.498
- Cite (ACL):
- Kang Zhou, Yuepei Li, and Qi Li. 2022. Distantly Supervised Named Entity Recognition via Confidence-Based Multi-Class Positive and Unlabeled Learning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7198–7211, Dublin, Ireland. Association for Computational Linguistics.
- Cite (Informal):
- Distantly Supervised Named Entity Recognition via Confidence-Based Multi-Class Positive and Unlabeled Learning (Zhou et al., ACL 2022)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/2022.acl-long.498.pdf
- Code
- kangISU/Conf-MPU-DS-NER
- Data
- BC5CDR, CoNLL-2003