## WkNER: Enhancing Named Entity Recognition with Word Segmentation Constraints and kNN Retrieval

This repository contains the code for our recent research work **"WkNER: Enhancing Named Entity Recognition with Word Segmentation Constraints and kNN Retrieval"** as well as supplementary experiments.


## Requirements


```
python -m venv "your venv"
source ./"your venv"/bin/activate
pip install -r requirements.txt
```

## Pre-trained models

The pre-trained models used in this paper are as follows:

**Chinese Models**

- BERT-base: https://huggingface.co/bert-base-chinese
- RoBERTa-base: https://huggingface.co/hfl/chinese-roberta-wwm-ext
- RoBERTa-large: https://huggingface.co/hfl/chinese-roberta-wwm-ext-large
- ChineseBERT-base: https://huggingface.co/iioSnail/ChineseBERT-base
- ChineseBERT-large: https://huggingface.co/iioSnail/ChineseBERT-large

**English Models**

- BERT-base: https://huggingface.co/bert-base-cased
- BERT-large: https://huggingface.co/bert-large-cased
- RoBERTa-large: https://huggingface.co/roberta-large

## Datasets

The dataset used in the experiment can be found in the `./data/ner_data` (or get it from a cloud drive, link: https://drive.google.com/drive/folders/1AnrxcbRaVNy1ibDucI3raauMxatYar6-?usp=sharing) directory in the root directory of this project.

###  Word segmentation information extraction

Before using the WkNER algorithm, the word segmentation information of the test data needs to be extracted. We already provide it in `./data/ner_data`, if you want to generate it yourself, you can use:

```
python classes/init.py
```

## Train and test

### train

The command scripts trained by each model in the experiment can be found in `./scripts` and `./classes/ChineseBert/scripts`, respectively.

### test

For testing our WkNER algorithm, the results of the hyperparameter settings are also stored with the training script. It should be noted that the search table is constructed during model inference. Therefore, it is necessary to set the storage location in the `./classes/init.py` and `./classes/ChineseBert/init.py` respectively.
