META DATA
---------
Model Name: BERT
Attack Target: stop_token
Gradient Model File: ../nli_attack_models/experiment_3/model_iter2200_epoch0.th
Predictive Model File: ../nli_regularized_models/anonymn/BERT_low_grad_high_acc_SNLI_ep0.th
Baseline Model File: ../nli_baseline_models/anonymn/BERT_trained2_SNLI.th
Cuda: True

Gradient Combined
------------------------
mean_reciprocal_rank: 0.964
hit_rate_1: 0.940
mean_grad_attribution: 0.837

Gradient Regularized 
---------------------------
mean_reciprocal_rank: 0.264
hit_rate_1: 0.062
mean_grad_attribution: 0.207

Gradient Baseline 
------------------------
mean_reciprocal_rank: 0.268
hit_rate_1: 0.051
mean_grad_attribution: 0.208

Gradient Evil Twin 
-------------------------
mean_reciprocal_rank: 1.000
hit_rate_1: 1.000
mean_grad_attribution: 0.998

Gradient Simple Combined
------------------------
mean_reciprocal_rank: 0.870
hit_rate_1: 0.792
mean_grad_attribution: 0.639

#######################################################

SmoothGrad Combined 
----------------------------------------
mean_reciprocal_rank: 0.943
hit_rate_1: 0.905
mean_grad_attribution: 0.799

SmoothGrad Regularized  
----------------------------------------
mean_reciprocal_rank: 0.249
hit_rate_1: 0.048
mean_grad_attribution: 0.203

SmoothGrad Baseline  
----------------------------------------
mean_reciprocal_rank: 0.256
hit_rate_1: 0.049
mean_grad_attribution: 0.201

SmoothGrad Evil Twin 
----------------------------------------
mean_reciprocal_rank: 1.000
hit_rate_1: 1.000
mean_grad_attribution: 0.998

SmoothGrad Simple Combined
--------------------------
mean_reciprocal_rank: 0.822
hit_rate_1: 0.721
mean_grad_attribution: 0.595

#######################################################

InteGrad Combined 
--------------------------------------------
mean_reciprocal_rank: 0.315
hit_rate_1: 0.062
mean_grad_attribution: 0.238

InteGrad Regularized 
--------------------------------------------
mean_reciprocal_rank: 0.273
hit_rate_1: 0.040
mean_grad_attribution: 0.215

InteGrad Baseline  
--------------------------------------------
mean_reciprocal_rank: 0.293
hit_rate_1: 0.040
mean_grad_attribution: 0.204

InteGrad Evil Twin  
--------------------------------------------
mean_reciprocal_rank: 0.999
hit_rate_1: 0.998
mean_grad_attribution: 0.980

InteGrad Simple Combined
------------------------
mean_reciprocal_rank: 0.303
hit_rate_1: 0.039
mean_grad_attribution: 0.212

MODEL ACCURACIES
------------------
Simple Combined Model Acc: 0.904 (ON FULL)
Combined Model Acc: 0.902 (ON FULL)
Regularized Model Acc: 0.905 (ON FULL)
Baseline Model Acc: 0.907
Evil Twin Model Acc: 0.343
