WkNER: Enhancing Named Entity Recognition with Word Segmentation Constraints and kNN Retrieval

Yanchun Li, Senlin Deng, Dongsu Shen, Shujuan Tian, Saiqin Long


Abstract
Fine-tuning Pre-trained Language Models (PLMs) is a popular Natural Language Processing (NLP) paradigm for addressing Named Entity Recognition (NER) tasks. However, neural network models often demonstrate poor generalization capabilities due to significant disparities between the knowledge learned by PLMs and the distribution of the target dataset, as well as data scarcity issues. In addition, token omission in predictions due to insufficient learning remains a challenge in NER. In this paper, we propose a kNN retrieval enhancement algorithm (WkNER) that incorporates word segmentation information to enhance the model’s generalization ability and alleviate the problem of missing entity tokens in prediction. The introduction of word segmentation information is used to preliminarily determine the boundaries of entities and alleviate the common prediction errors of missing tokens within entities made by the fine-tuned model. Secondly, we find that non-entities in the retrieval table contain a large amount of redundant information, and explore the effects of introducing non-entity information of different scales on the model. Experimental results show that our proposed method significantly improves the performance of baseline models, and achieves better or compared recognition accuracy than previous state-of-the-art models in multiple public Chinese and English datasets. Especially in low-resource scenarios, our method achieves higher accuracy on 20% of the dataset than the original method on the full dataset.
Anthology ID:
2024.lrec-main.1535
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
17651–17663
Language:
URL:
https://aclanthology.org/2024.lrec-main.1535
DOI:
Bibkey:
Cite (ACL):
Yanchun Li, Senlin Deng, Dongsu Shen, Shujuan Tian, and Saiqin Long. 2024. WkNER: Enhancing Named Entity Recognition with Word Segmentation Constraints and kNN Retrieval. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 17651–17663, Torino, Italia. ELRA and ICCL.
Cite (Informal):
WkNER: Enhancing Named Entity Recognition with Word Segmentation Constraints and kNN Retrieval (Li et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2024.lrec-main.1535.pdf
Optional supplementary material:
 2024.lrec-main.1535.OptionalSupplementaryMaterial.zip