# Language Model-Based Determinantal Point Process (LM-DPP)

## Requirements
```bash
conda create -n lm-dpp python=3.7
conda activate lm-dpp
```
First install the package:
```bash
pip install -r requirements.txt
```
**Data**

The datasets used can be downloaded from the Internet.

## Selection and Inference pipeline
```bash
sentence_transformer_model=paraphrase-mpnet-base-v2
task_name=copa
selective_annotation_method=lm-dpp
model_cache_dir=./models/gpt-j-6B
data_cache_dir=./datasets
embedding_model=./models/$sentence_transformer_model
annotation_size=100
output_dir=./outputs/$sentence_transformer_model/$task_name/$selective_annotation_method-annotation_size_$annotation_size

python main.py \
--task_name $task_name \
--selective_annotation_method $selective_annotation_method \
--model_cache_dir $model_cache_dir \
--data_cache_dir $data_cache_dir \
--embedding_model $embedding_model \
--annotation_size $annotation_size \
--output_dir $output_dir

```

- **selective_annotation_method**: selection method: you can choose `random`, `mfl`, `fast_votek`, `lm-dpp` and so on.
- **embedding_model**: sentence bert model
- **annotation_size**: annotation budget (you can set to {16,100})

## Run GPT-3 Results
Enter the LLMs folder and execute the corresponding `ipynb` file. Note: Modify the corresponding task and selection method.