# SDNet

- An implementation for ``Few-shot Named Entity Recognition with Self-describing Networks``

## Quick links
* [Environment](#Environment)
* [Dataset](#Dataset)
* [Fewshot Fine-tuning](#Fewshot-Fine-tuning)
* [Model Evaluation](#Model-Evaluation)

### Environment

```bash
conda sdnet create -f environment.yml
conda activate sdnet
pip install -r requirements.txt
```


### Dataset
Dataset is be putted in data folder:

```text
data/dataset
├── test.json
├── kshot.json/full.json
└── mapping.json
```

dataset can be : CoNLL, WNUT17, mit-rest, mit-movie, mit-movie2, re3d, OntoNote, i2b2 

test.json is data file, and each line is a Dict. 

Instance format: Each instance is a Dict, containing `tokens` and `entity` fields, in which `tokens` is the list of tokens, and `entity` is the list of entity mentions.
```text
{
    "tokens": [token1,token2,...],
    "entity": [
        [
            {"text":mention1, "type": type1, "offset":[startindex1,endindex1]},
            {"text":mention2, "type": type2, "offset":[startindex2,endindex2]},
            ...
        ]
},
```

kshot.json: the data file for k-shot fine-tuning, each line is a Dict, containing `support` and `target_label` fields, in which `support` is the list of instances in support set, and `target_label` is the list of target novel entity types.
full.json: the data file for full shot fine-tuning, each line is a Dict, containing `support` and `target_label` fields, in which `support` is the list of instances in support set (full trraining set), and `target_label` is the list of target novel entity types.
test.json: each line is an instance.
mapping.json: a Dict mapping, the key is label name, the value is mapping words for each label (is commonly label name ). 

### Fewshot Fine-tuning
run:
```bash
python fsner.py -dataset dataset -K 5 -sdnet
```
+ -dataset can be CoNLL, WNUT17, mit-rest, mit-movie, mit-movie2, re3d, OntoNote, i2b2 
+ -sdnet: finetuning with our pre-trained SDNet, if not added, using t5-base
+ -K: control the shot number, default is 5
+ -full: if added, using the full training set to fine-tune


The model checkpoint is saved in tmp/dataset/...

### Model Evaluation
just add -evalue:
```bash
python fsner.py -dataset dataset -sdnet -K 5 -evalue
```

