Date: 2024-03-15-22-29-52
Model: gpt-3.5-azure-chat
Test on 1000 samples : (1 split)
Accuracy: 0.547
F1: 0.41879881505817274
Date: 2024-03-16-00-58-45
Model: gpt-3.5-azure-chat
Test on 1000 samples : (2 split)
Accuracy: 0.576
F1: 0.3996297709045238
Date: 2024-03-16-04-26-17
Model: gpt-3.5-azure-chat
Test on 1000 samples : (3 split)
Accuracy: 0.581
F1: 0.40475772458597725
Date: 2024-03-16-08-29-48
Model: gpt-3.5-azure-chat
Test on 1000 samples : (4 split)
Accuracy: 0.602
F1: 0.3996875951417063
Date: 2024-03-16-13-14-02
Model: gpt-3.5-azure-chat
Test on 1000 samples : (5 split)
Accuracy: 0.595
F1: 0.3831469341419088
Date: 2024-04-30-15-08-06
Split-value: 4Model: mixtral-7B-chat
Test on 1000 samples :
Accuracy: 0.583
F1: 0.09309553844587615
Date: 2024-05-03-00-04-10
Split-value: 1Model: mixtral-7B-chat
Test on 1000 samples :
Accuracy: 0.565
F1: 0.05737590125012216
Date: 2024-05-03-04-49-48
Split-value: 3Model: mixtral-7B-chat
Test on 1000 samples :
Accuracy: 0.57
F1: 0.10496991460519606
Date: 2024-05-03-10-47-39
Split-value: 5Model: mixtral-7B-chat
Test on 1000 samples :
Accuracy: 0.586
F1: 0.08191745191607991
Date: 2024-05-06-22-46-39
Split-value: 2Model: mixtral-7B-chat
Test on 1000 samples :
Accuracy: 0.567
F1: 0.10515759494626502
Date: 2024-05-11-22-53-12
Split-value: 1Model: llama-3-8B-groq-chat
Test on 1000 samples :
Accuracy: 0.523
F1: 0.23468604516215247
Date: 2024-05-14-18-07-42
Split-value: 1Model: llama-3-70B-chat
Test on 1000 samples :
Accuracy: 0.618
F1: 0.15229535362724766
Date: 2024-05-18-03-49-22
Split-value: 2Model: llama-3-8B-groq-chat
Test on 1000 samples :
Accuracy: 0.578
F1: 0.2261772904265222
Date: 2024-05-18-23-28-49
Split-value: 3Model: llama-3-8B-groq-chat
Test on 1000 samples :
Accuracy: 0.554
F1: 0.2208399573660007
Date: 2024-05-19-01-59-01
Split-value: 2Model: llama-3-70B-chat
Test on 1000 samples :
Accuracy: 0.647
F1: 0.1227954904828001
Date: 2024-05-21-02-29-23
Split-value: 3Model: llama-3-70B-chat
Test on 1000 samples :
Accuracy: 0.624
F1: 0.1259231146879294
Date: 2024-05-22-08-05-06
Split-value: 5Model: llama-3-70B-chat
Test on 1000 samples :
Accuracy: 0.638
F1: 0.14756391627032198
Date: 2024-05-23-19-40-15
Split-value: 3Model: llama-3-8B-groq-chat
Test on 1000 samples :
Accuracy: 0.544
F1: 0.22743577249792213
Date: 2024-05-24-21-13-35
Split-value: 4Model: llama-3-8B-groq-chat
Test on 1000 samples :
Accuracy: 0.547
F1: 0.22439950765192399
Date: 2024-05-25-03-04-11
Split-value: 3Model: llama-3-70B-chat
Test on 1000 samples :
Accuracy: 0.636
F1: 0.1298594725913994
Date: 2024-05-25-08-34-05
Split-value: 2Model: llama-3-8B-groq-chat
Test on 1000 samples :
Accuracy: 0.548
F1: 0.24111043584968456
Date: 2024-05-25-19-55-34
Split-value: 5Model: llama-3-8B-groq-chat
Test on 1000 samples :
Accuracy: 0.554
F1: 0.22087289958850387
Date: 2024-06-07-05-47-32
Split-value: 1Model: gpt-4-azure-chat
Test on 1000 samples :
Accuracy: 0.667
F1: 0.43717387168889116
Date: 2024-06-08-05-15-37
Split-value: 2Model: gpt-4-azure-chat
Test on 1000 samples :
Accuracy: 0.695
F1: 0.4328187070814982
Date: 2024-06-09-10-23-07
Split-value: 1Model: mixtral-7B-chat
Test on 100 samples :
Accuracy: 0.55
F1: 0.055275583247131374
Date: 2024-06-09-11-12-53
Split-value: 2Model: mixtral-7B-chat
Test on 100 samples :
Accuracy: 0.53
F1: 0.10803245984578962
Date: 2024-06-09-12-54-50
Split-value: 3Model: mixtral-7B-chat
Test on 100 samples :
Accuracy: 0.6
F1: 0.12612869196269202
Date: 2024-06-09-14-27-39
Split-value: 4Model: mixtral-7B-chat
Test on 100 samples :
Accuracy: 0.57
F1: 0.11322454949234156
Date: 2024-06-09-16-11-36
Split-value: 5Model: mixtral-7B-chat
Test on 100 samples :
Accuracy: 0.64
F1: 0.06696066691605868
Date: 2024-06-09-18-16-48
Split-value: 3Model: gpt-4-azure-chat
Test on 1000 samples :
Accuracy: 0.699
F1: 0.4310284004054963
