Requirements:

itertools
scipy
termcolor
torch
transformers
tqdm
pandas
pandas
numpy
nltk
argparse



Begin running CLOSS with default parameters on IMDB with BERT by executing:

CUDA_VISIBLE_DEVICES=0 python3 closs.py --dataset imdb --beamwidth 15 --w 5 --K 30 --evaluation svs --model bert --retrain_epochs 10 --lm_head texts:imdb_seperate --saliency_method norm_grad --log_note _test --percent_replace 15 --percent_salient 15


Like so for QNLI with RoBERTa:

CUDA_VISIBLE_DEVICES=0 python3 closs.py --dataset qnli --beamwidth 15 --w 5 --K 30 --evaluation svs --model roberta --retrain_epochs 10 --lm_head texts:qnli_seperate --saliency_method norm_grad --log_note _test --percent_replace 15 --percent_salient 15

You will have to wait around 10 min to retrain the language modeling head. The result will be cached, so you won't have to do it again for the same dataset.


Change CUDA_VISIBLE_DEVICES to change the cuda gpu used.



--evaluation controls the specific test to run. Use "closs" for default CLOSS, "closs-eo" for CLOSS without embedding optimization and "closs-sv" for CLOSS without Shapley values. Additionally, use "hotflip" to run the HotFlip baseline. HotFlip O uses beamwidth 100 and K 1, while HotFlip D uses beamwidth and K equal to 10.

To run CLOSS-RTL, use --lm_head default and --evaluation svs
CLOSS-RTL with IMDB BERT:

CUDA_VISIBLE_DEVICES=0 python3 closs.py --dataset imdb --beamwidth 15 --w 5 --K 30 --evaluation svs --model bert --retrain_epochs 10 --lm_head default --saliency_method norm_grad --log_note _test --percent_replace 15 --percent_salient 15


After a run finishes, summary statistics will be put in a text file in text_logs and a tsv log of each counterfactual will be saved in tsv_logs.
