

Colab Notebools:

Table4.ipynb: This colab notebook re-produces the results in Table 4. 

wiki-DoE.ipynb: This colab notebook calcuates the DoE scores used in section 6 of the paper. DoE scores are saved in 'toxic_DoEs.csv' and used by the 'augment-DoE-based.ipynb' for data augmentation. 

augment-DoE-based.ipynb: This colab notebook trains an augmented Wiki classifier discussed in section 6. The data augmnetation is based on DoE scores saved in 'toxic_DoEs.csv'. We use the trainer module from Huggingface for training a RoBerta-based binary classifier. 




Python modules:
 
word_process.py: used to preprocess tweets 

Roberta_model_data.py : class to define roberta model and to compute gradients and logits of the classifier

TCAV.py: fuctions to claculate sensitivities and the TCAV scores (Section 4) 

DoE.py: functions to calcualte the DoE score (Sections 5 and 6)
