This experiment (number #7) used the following hyper parameters:
        - Model: BERT
        - Batch size: 1
        - Learning rate: 0.0001
        - Lambda: 0.1
        - Loss function: MSE
        - Embedding Operator: l2
        - Normalization: l1
        - Normalization2: none
        - Softmax enabled: False
        - Cuda enabled: True
    