Evaluating on G4
Loading fasttext embeddings
Embed load complete!
Running experiment number 0 out of 10
Running on device:  cuda


Training complete. Best (unpaired) train F1 0.4812203424759703 from epoch 14
Best val F1 0.6361442753475179 from epoch 9
Loading best model, which was from epoch 9
On holdout set 'TEST_SET' - Accuracy: 0.9884753206515786. Precision: [0.99547047 0.11461412]. Recall: [0.99293068 0.16843629]. F1: [0.99419895 0.13640805] (Mean 0.5653035021399331).
Running experiment number 1 out of 10
Running on device:  cuda


Training complete. Best (unpaired) train F1 0.49306986438051964 from epoch 14
Best val F1 0.6116902729853355 from epoch 9
Loading best model, which was from epoch 9
On holdout set 'TEST_SET' - Accuracy: 0.989343779176816. Precision: [0.99556555 0.13803019]. Recall: [0.99371208 0.18532819]. F1: [0.99463795 0.15822002] (Mean 0.5764289874036547).
Running experiment number 2 out of 10
Running on device:  cuda


Training complete. Best (unpaired) train F1 0.42720730717228966 from epoch 12
Best val F1 0.6192487747787286 from epoch 7
Loading best model, which was from epoch 7
On holdout set 'TEST_SET' - Accuracy: 0.9904365242881509. Precision: [0.99545861 0.14975845]. Recall: [0.99492351 0.16457529]. F1: [0.99519099 0.15681766] (Mean 0.5760043245951979).
Running experiment number 3 out of 10
Running on device:  cuda


Training complete. Best (unpaired) train F1 0.46572809566806567 from epoch 13
Best val F1 0.6410273836534541 from epoch 8
Loading best model, which was from epoch 8
On holdout set 'TEST_SET' - Accuracy: 0.9912736870106771. Precision: [0.99548062 0.17664975]. Recall: [0.99574687 0.16795367]. F1: [0.99561372 0.17219198] (Mean 0.5839028543886877).
Running experiment number 4 out of 10
Running on device:  cuda


Training complete. Best (unpaired) train F1 0.3801127540497225 from epoch 11
Best val F1 0.6260813640161864 from epoch 6
Loading best model, which was from epoch 6
On holdout set 'TEST_SET' - Accuracy: 0.982737756821181. Precision: [0.9956512  0.07921525]. Recall: [0.98695479 0.20656371]. F1: [0.99128392 0.11451505] (Mean 0.5528994849967541).
Running experiment number 5 out of 10
Running on device:  cuda


Training complete. Best (unpaired) train F1 0.498951512608341 from epoch 14
Best val F1 0.6379752809500641 from epoch 9
Loading best model, which was from epoch 9
On holdout set 'TEST_SET' - Accuracy: 0.9884127290461561. Precision: [0.99561352 0.12708399]. Recall: [0.99272353 0.19498069]. F1: [0.99416642 0.15387545] (Mean 0.574020936388661).
Running experiment number 6 out of 10
Running on device:  cuda


Training complete. Best (unpaired) train F1 0.3953054370638162 from epoch 11
Best val F1 0.6221431285016326 from epoch 6
Loading best model, which was from epoch 6
On holdout set 'TEST_SET' - Accuracy: 0.9848997751918172. Precision: [0.99559007 0.08871681]. Recall: [0.98919935 0.19353282]. F1: [0.99238442 0.12166262] (Mean 0.5570235230634051).
Running experiment number 7 out of 10
Running on device:  cuda


Training complete. Best (unpaired) train F1 0.5480335877784602 from epoch 16
Best val F1 0.6477747290577712 from epoch 11
Loading best model, which was from epoch 11
On holdout set 'TEST_SET' - Accuracy: 0.9929610523735258. Precision: [0.99531976 0.2372171 ]. Recall: [0.99761384 0.13658301]. F1: [0.99646548 0.17335375] (Mean 0.5849096148778542).
Running experiment number 8 out of 10
Running on device:  cuda


Training complete. Best (unpaired) train F1 0.5283194248176739 from epoch 15
Best val F1 0.6671252297762291 from epoch 10
Loading best model, which was from epoch 10
On holdout set 'TEST_SET' - Accuracy: 0.9885874639446273. Precision: [0.99563775 0.13194888]. Recall: [0.99287561 0.19932432]. F1: [0.99425476 0.15878508] (Mean 0.5765199212854616).
Running experiment number 9 out of 10
Running on device:  cuda


Training complete. Best (unpaired) train F1 0.4310031666960131 from epoch 12
Best val F1 0.6262992575253282 from epoch 7
Loading best model, which was from epoch 7
On holdout set 'TEST_SET' - Accuracy: 0.9858934169278997. Precision: [0.99561541 0.09843562]. Recall: [0.99017741 0.19739382]. F1: [0.99288897 0.13136342] (Mean 0.5621261922802725).
For holdout TEST_SET; mean F1 is 0.5709139341419881 with std 0.010431513902163297; mean accuracy 0.988302150543243 and std 0.00290929869761201
F1 95% confidence interval: (0.564448414840577, 0.5773794534433992)
Accuracy 95% confidence interval: (0.9864989485287128, 0.9901053525577732)
F1s:  [0.5653035021399331, 0.5764289874036547, 0.5760043245951979, 0.5839028543886877, 0.5528994849967541, 0.574020936388661, 0.5570235230634051, 0.5849096148778542, 0.5765199212854616, 0.5621261922802725]
Accuracies:  [0.9884753206515786, 0.989343779176816, 0.9904365242881509, 0.9912736870106771, 0.982737756821181, 0.9884127290461561, 0.9848997751918172, 0.9929610523735258, 0.9885874639446273, 0.9858934169278997]
