# Distantly Supervised Named Entity Recognition via Confidence-Based Multi-Positive and Unlabeled Learning

This zip file contains our source code, data, and the environment setting file ```requirements.txt```.

## Environment

We run the code on GPU with Python 3.7.6. You can set up the environment automatically by

```shell
pip -r requirements.txt
```
Note that ```requirements.txt``` may contain some unnecessary packages.

## Training & Evaluation
For running the code, please check out the folder ``scripts``.

For Conf-MPU, run the corresponds commands in ``determine_entity.txt``, ``add_prob.txt``, and ``NER_classification.txt`` in order.
For example, run Conf-MPU on BC5CDR_Dict_1.0 (i.e., BC5CDR (Big Dict) in our paper):
```shell script
python pu_main.py --type bnPU --dataset BC5CDR_Dict_1.0 --flag Entity --m 10 --determine_entity True --embedding bio-embedding --epochs 100
```
After this, you have to manually pass the path of the saved model in folder ``saved_model`` to the corresponding clause in ``pu_main.py``, then
```shell script
python pu_main.py --type bnPU --dataset BC5CDR_Dict_1.0 --add_probs True --flag ALL --added_suffix entity_prob --embedding bio-embedding
```
Then you will see ```train.ALL.txt.entity_prob``` generated in the corresponding dataset folder, then
```shell script
python pu_main.py --type entity_conf_mPU --dataset BC5CDR_Dict_1.0 --flag ALL --suffix entity_prob --m 28 --eta 0.5 --lr 0.0005 --loss MAE
 --embedding bio-embedding --epochs 100
```

For MPU, run the corresponding commands in ``NER_classification.txt``.
For example, run MPU on BC5CDR_Dict_1.0:
```shell script
python pu_main.py --type mPU --dataset BC5CDR_Dict_1.0 --flag ALL --m 28 --lr 0.0005 --loss MAE --embedding bio-embedding --epochs 100
```

MPN is similar to MPU.

For BNPU, run the corresponding commands in ``NER_classification.txt`` and ``inference.txt`` in order.
For example, run BNPU on BC5CDR_Dict_1.0:
```shell script
python pu_main.py --type bnPU --dataset BC5CDR_Dict_1.0 --flag Chemical --m 28 --embedding bio-embedding --epochs 100
python pu_main.py --type bnPU --dataset BC5CDR_Dict_1.0 --flag Disease --m 28 --embedding bio-embedding --epochs 100
```
After this, you have to manually pass the path of the saved models in folder ```saved_model``` to the corresponding clause in ``pu_main.py``, then
```shell script
python pu_main.py --type bnPU --dataset BC5CDR_Dict_1.0 --inference True --embedding bio-embedding
``` 

Note that the class weight parameter is denoted as ``m`` (i.e., \gamma in our paper) and the threshold parameter is denoted as `eta` (i.e., \tau in
 paper) in commands.
