Evaluating on D2
Loading fasttext embeddings
Embed load complete!
Running experiment number 0 out of 5
Running on device:  cuda


Training complete. Best (unpaired) train F1 0.20377157368212534 from epoch 11
Best val F1 0.5634024271913982 from epoch 6
Loading best model, which was from epoch 6
On holdout set 'TEST_SET' - Accuracy: 0.9561236850494103. Precision: [0.97521948 0.22417252]. Recall: [0.97966721 0.19094404]. F1: [0.97743828 0.20622837] (Mean 0.5918333292306491).
Running experiment number 1 out of 5
Running on device:  cuda


Training complete. Best (unpaired) train F1 0.25338544190464285 from epoch 29
Best val F1 0.5763353091698248 from epoch 24
Loading best model, which was from epoch 24
On holdout set 'TEST_SET' - Accuracy: 0.9571692700031877. Precision: [0.97545732 0.23843782]. Recall: [0.98052153 0.19820589]. F1: [0.97798287 0.21646839] (Mean 0.5972256294550756).
Running experiment number 2 out of 5
Running on device:  cuda


Training complete. Best (unpaired) train F1 0.21843422749321453 from epoch 15
Best val F1 0.5726695103527948 from epoch 10
Loading best model, which was from epoch 10
On holdout set 'TEST_SET' - Accuracy: 0.9603187759005419. Precision: [0.97453439 0.24918673]. Recall: [0.98483255 0.1636053 ]. F1: [0.97965641 0.1975245 ] (Mean 0.5885904524360432).
Running experiment number 3 out of 5
Running on device:  cuda


Training complete. Best (unpaired) train F1 0.20979593950814499 from epoch 10
Best val F1 0.5526105733904945 from epoch 5
Loading best model, which was from epoch 5
On holdout set 'TEST_SET' - Accuracy: 0.9526936563595793. Precision: [0.97595686 0.21419624]. Recall: [0.97526418 0.21913712]. F1: [0.9756104  0.21663851] (Mean 0.5961244554892863).
Running experiment number 4 out of 5
Running on device:  cuda


Training complete. Best (unpaired) train F1 0.22722607307235385 from epoch 18
Best val F1 0.568791032826261 from epoch 13
Loading best model, which was from epoch 13
On holdout set 'TEST_SET' - Accuracy: 0.9597577303155881. Precision: [0.97456921 0.2435494 ]. Recall: [0.98420167 0.16531397]. F1: [0.97936176 0.19694656] (Mean 0.5881541613336714).
For holdout TEST_SET; mean F1 is 0.5923856055889452 with std 0.003742142346768675; mean accuracy 0.9572126235256615 and std 0.0027471615143346206
F1 95% confidence interval: (0.5891054731987538, 0.5956657379791365)
Accuracy 95% confidence interval: (0.9548046302883019, 0.9596206167630211)
F1s:  [0.5918333292306491, 0.5972256294550756, 0.5885904524360432, 0.5961244554892863, 0.5881541613336714]
Accuracies:  [0.9561236850494103, 0.9571692700031877, 0.9603187759005419, 0.9526936563595793, 0.9597577303155881]
Evaluating on D2
Loading fasttext embeddings
Embed load complete!
Running experiment number 0 out of 5
Running on device:  cuda


Training complete. Best (unpaired) train F1 0.20038783256246764 from epoch 11
Best val F1 0.5851650476973953 from epoch 6
Loading best model, which was from epoch 6
On holdout set 'TEST_SET' - Accuracy: 0.9584953777494422. Precision: [0.97470962 0.23368298]. Recall: [0.98271647 0.17129432]. F1: [0.97869667 0.19768302] (Mean 0.5881898425243581).
Running experiment number 1 out of 5
Running on device:  cuda


Training complete. Best (unpaired) train F1 0.20480340799864127 from epoch 9
Best val F1 0.585032694393775 from epoch 5
Loading best model, which was from epoch 5
On holdout set 'TEST_SET' - Accuracy: 0.9552566145999363. Precision: [0.97533436 0.21976967]. Recall: [0.97862888 0.19564289]. F1: [0.97697884 0.20700565] (Mean 0.5919922458854916).
Running experiment number 2 out of 5
Running on device:  cuda


Training complete. Best (unpaired) train F1 0.22309303622271826 from epoch 15
Best val F1 0.5892752247341131 from epoch 11
Loading best model, which was from epoch 11
On holdout set 'TEST_SET' - Accuracy: 0.9600637551801083. Precision: [0.97436731 0.24167211]. Recall: [0.98474055 0.15805211]. F1: [0.97952647 0.1911157 ] (Mean 0.585321085168731).
Running experiment number 3 out of 5
Running on device:  cuda


Training complete. Best (unpaired) train F1 0.19354971095931842 from epoch 7
Best val F1 0.5666388585236752 from epoch 2
Loading best model, which was from epoch 2
On holdout set 'TEST_SET' - Accuracy: 0.9643863563914568. Precision: [0.97291228 0.25854701]. Recall: [0.9908785  0.10337463]. F1: [0.98181321 0.14769606] (Mean 0.5647546364129474).
Running experiment number 4 out of 5
Running on device:  cuda


Training complete. Best (unpaired) train F1 0.20037848049769202 from epoch 9
Best val F1 0.6007190256401833 from epoch 5
Loading best model, which was from epoch 5
On holdout set 'TEST_SET' - Accuracy: 0.9630474976091807. Precision: [0.9734874  0.25591586]. Recall: [0.98884128 0.12473302]. F1: [0.98110427 0.1677197 ] (Mean 0.5744119866993486).
For holdout TEST_SET; mean F1 is 0.5809339593381753 with std 0.009984858863775946; mean accuracy 0.960249920306025 and std 0.0032552333608336036
F1 95% confidence interval: (0.5721818446574387, 0.5896860740189118)
Accuracy 95% confidence interval: (0.957396582459665, 0.963103258152385)
F1s:  [0.5881898425243581, 0.5919922458854916, 0.585321085168731, 0.5647546364129474, 0.5744119866993486]
Accuracies:  [0.9584953777494422, 0.9552566145999363, 0.9600637551801083, 0.9643863563914568, 0.9630474976091807]
