#########################################################################
#				Code Appendix for the paper 							#
#	"Domain Knowledge Transferring for Pre-trained Language Model 		#
#		via Calibrated Activation Boundary Distillation"				#
#########################################################################

All datasets which are used in our paper are publicly available.
Also, we preprocessed all datasets using preprocessing code which provided by BLUE benchmark.

We separated our codes into two parts, as we described in the paper.

CTT : Calibrated Teacher Training
ABD : Activation Boundary Distillation

1. CTT
The main training & modeling codes are from the BERT and BioBERT GitHub repositories.
We added confidence penalty loss term (CPL) on the fine-tuning codes (run_re_cpl, run_i2b2_cpl, run_hoc_cpl),
and the hyperparameter for the strength of the penalty can be controlled by changing the argument 'confidence_penalty_weight'.

2. ABD
The final [CLS] embedding of the teacher model can be extracted by 'extract_cls.py' code in /ABD/cls_extraction_and_dim_expansion
To distill the knowledge from extracted embedding, the vectors should be expanded its dimension using 'dimension_expander' codes in same directory.
Those expanded vectors are used in the actual ABD & evaluation steps, and thus all train,dev,test examples should be extracted and expanded.

The initial fine-tuning of the student model can be performed using the code in /ABD/initial_finetuning_ALBERT(or RoBERTa)/.
We separated the codes for each task for ease of use.

The actual ABD training codes are in /ABD/ABD_ALBERT (or RoBERTa)/ directory.
In ABD, the user should feed initially fine-tuned ALBERT/RoBERTa model,
and the teacher's feature embedding files which are previously extracted.

##### Core library versions we used #####

tensorflow-gpu==1.15.0
torch==1.8.1
transformers==4.7.0

##########
The training code using Pytorch and transformers is referenced below: 
https://mccormickml.com/2019/07/22/BERT-fine-tuning/
##########