This experiment (number #27) used the following hyper parameters:
        - Model: BERT
        - Batch size: 12
        - Learning rate: 1e-05
        - Lambda: 0.0
        - Loss function: MSE
        - Embedding Operator: dot
        - Normalization: l1
        - Normalization2: none
        - Cuda enabled: True
        - Importance: first_token
    