# Selective Prediction for Evaluating Confidence of Knowledge in Language Models

Source code of the paper "Selective Prediction for Evaluating Confidence of Knowledge in Language Models".

## Setup
1. Follow [here](./LAMA_docs/README.md) to create environment and download data. 
2. Install lama: `pip install -e .`

## Usage
### 1. Run prediction

- i) Prediction

    Sample command for running prediction of SQuAD with BERT-base:

    ```bash
    python scripts/run_prediction.py --confing expr/config/prediction_squad_bb.json
    ```

    The prediction results will be saved in `expr/output/`.

- ii) Create databases of the results

    ```bash
    python scripts/results_to_sqlite.py expr/output/
    ```

    SQLite database files ("result.sqlite") will be created corresponding to the result files ("result.pkl").

### 2. Compute confidence scores

Sample command for SQuAD with BERT-base:

```bash
python scripts/run_confidence.py --config config/confidence_bb_top100.json \
    --db expr/output/Squad/results/bert_base/ \
    --glob "**/result.sqlite"
```

For BERT-large experiments, use `config/confidence_bl_top100.json` instead.

### 3. Evaluation

Sample command for SQuAD with BERT-base:

```bash
python scripts/eval.py --config config/rcauc_config.json \
    expr/output/Squad/results/bert_base/ \
    > rcauc.jsonl
```


## Acknowledgements
A large portion of this repo is based on [LAMA](https://github.com/facebookresearch/LAMA). The documents of the original repo can be found [here](./LAMA_docs/). 

