This experiment (number #6) used the following hyper parameters:
        - Model: BERT
        - Batch size: 12
        - Learning rate: 1e-05
        - Lambda: 0.1
        - Loss function: MSE
        - Embedding Operator: dot
        - Normalization: l1
        - Normalization2: none
        - Cuda enabled: True
        - Importance: stop_token
        - Attack Target: premise
    