#  "I Never Said That": A dataset, typology and baselines on response clarity classification.

There are two folders: one containing the dataset and another containing the classification results produced by all the models presented in the paper.

The dataset folder includes the following files:

- QAEvasion.csv: a file containing the dataset.
- Inter-Annotator Agreement folder: annotations from each annotator for corresponding parts.
- Counterfactual Summaries folder: counterfactual summaries (and the results of GPT-3.5 Turbo) for each part, along with user annotations.
  
The dataset, along with the model, will be published on Hugging Face.

## Installation
- pip install -r requirements.txt

## 1. Dataset Analysis

### 1.1 Statistics of the Dataset
To obtain statistics of the dataset, run the following command:
```
>>> python datasetAnalysis.py
```

### 1.2 Analysis of Counterfactual Summaries
To analyze counterfactual summaries, execute the following command:
```
>>> python counterfactual_summaries_analysis.py
```

## 2. Zero-Shot Inference
### 2.1 Zero-Shot Inference on Open-source Models
For the Falcon-40b model (similarly with any other hugging face model):
```
>>> python zero_shot_.py --model_name "tiiuae/falcon-40b" --output_file "falcon_40b_zero_shot_clarity.pickle"
```
```
>>> python zero_shot_.py --model_name "tiiuae/falcon-40b" --output_file "falcon_40b_zero_shot_evasion.pickle" --add_specific_labels
```
### 2.2 Zero-Shot Inference on GPT3.5_turbo
For direct clarity problem:
```
>>> python chatgpt_zero_shot_.py --token ... --output_file "falcon_40b_zero_shot_clarity.pickle" 
```
For evasion based clarity problem:
```
>>> python chatgpt_zero_shot_.py --token ... --output_file "falcon_40b_zero_shot_evasion.pickle" --add_specific_labels
```

### 3. Training your own model
Using lora.py, you can train the model with the following arguments:

- model_name
- train_size (default: 2700 samples)
- annotators_ids (Ids of annotators used during training; default: None, using all instances regardless of annotator)
- output_model_dir (Directory to save the trained model)
- add_specific_labels (Include this flag to specify whether evasion labels, e.g., General, Partia, etc., should be added or not.)
Example commands:
```
>>> python lora.py --model_name "tiiuae/falcon-40b" --output_model_dir "falcon_40b_clarity"
>>> python lora.py --model_name "tiiuae/falcon-40b" --output_model_dir "falcon_40b_clarity"
```

or 

```
>>> python lora.py --model_name "tiiuae/falcon-40b" --output_model_dir "falcon_40b_evasion" --add_specific_labels
```
The second command will train a models on the evasion based clarity problem (all the labels) instead of the 3 classes of evasion problem only.

Similarly, for training the encoders: 
```
>>> python encoder_train.py --model_name "roberta-base" --experiment "direct_clarity"
>>> python encoder_train.py --model_name "roberta-base" --experiment "evasion_based_clarity"
```

and inference: 
```
>>> python encoder_inference.py --model_name "roberta-base" --experiment "direct_clarity"
>>> python encoder_inference.py --model_name "roberta-base" --experiment "evasion_based_clarity"
```


## 4. Results Presented in the Paper
In order to export the results presented in the paper, run the following command:

```
>>> python results.py
```

## License
MIT