META DATA
---------
Model Name: BERT
Attack Target: first_token
Gradient Model File: ../nli_attack_models/experiment_8/model_iter1800_epoch0.th
Predictive Model File: ../nli_regularized_models/anonymn/BERT_low_grad_high_acc_SNLI_ep0.th
Baseline Model File: ../nli_baseline_models/anonymn/BERT_trained2_SNLI.th
Cuda: True

Gradient Combined
-----------------
mean_reciprocal_rank_first: 0.997
hit_rate_1_first: 0.994
mean_grad_attribution_first: 0.872

Gradient Simple Combined
------------------------
mean_reciprocal_rank_first: 0.989
hit_rate_1_first: 0.983
mean_grad_attribution_first: 0.750

Gradient Regularized
--------------------
mean_reciprocal_rank_first: 0.097
hit_rate_1_first: 0.011
mean_grad_attribution_first: 0.025

Gradient Baseline
-----------------
mean_reciprocal_rank_first: 0.090
hit_rate_1_first: 0.006
mean_grad_attribution_first: 0.023

Gradient Evil Twin
------------------
mean_reciprocal_rank_first: 1.000
hit_rate_1_first: 1.000
mean_grad_attribution_first: 0.998

########################################################

SmoothGrad Combined
-------------------
mean_reciprocal_rank_first: 0.988
hit_rate_1_first: 0.982
mean_grad_attribution_first: 0.833

SmoothGrad Simple Combined
--------------------------
mean_reciprocal_rank_first: 0.982
hit_rate_1_first: 0.971
mean_grad_attribution_first: 0.688

SmoothGrad Regularized
----------------------
mean_reciprocal_rank_first: 0.099
hit_rate_1_first: 0.013
mean_grad_attribution_first: 0.026

SmoothGrad Baseline
-------------------
mean_reciprocal_rank_first: 0.093
hit_rate_1_first: 0.011
mean_grad_attribution_first: 0.024

SmoothGrad Evil Twin
--------------------
mean_reciprocal_rank_first: 1.000
hit_rate_1_first: 1.000
mean_grad_attribution_first: 0.998

########################################################

InteGrad Combined
-----------------
mean_reciprocal_rank_first: 0.230
hit_rate_1_first: 0.056
mean_grad_attribution_first: 0.053

InteGrad Simple Combined
------------------------
mean_reciprocal_rank_first: 0.151
hit_rate_1_first: 0.025
mean_grad_attribution_first: 0.033

InteGrad Regularized
--------------------
mean_reciprocal_rank_first: 0.072
hit_rate_1_first: 0.003
mean_grad_attribution_first: 0.016

InteGrad Baseline
-----------------
mean_reciprocal_rank_first: 0.070
hit_rate_1_first: 0.003
mean_grad_attribution_first: 0.015

InteGrad Evil Twin
------------------
mean_reciprocal_rank_first: 1.000
hit_rate_1_first: 1.000
mean_grad_attribution_first: 0.992

MODEL ACCURACIES
------------------
Combined Model Acc: 0.903
Simple Combined Model Acc: 0.905
Regularized Model Acc: 0.905
Baseline Model Acc: 0.907
Evil Twin Model Acc: 0.329
