Date: 2024-03-15-22-32-26
Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.7818181818181819
Trusted Testimony accuracy: 0.8
False Belief accuracy: 0.75
True Belief accuracy: 0.85
Late Label accuracy: 0.7333333333333333
Uninformative Label accuracy: 0.7368421052631579
Transparent Access accuracy: 0.8125
Date: 2024-03-16-01-03-55
Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.8090909090909091
Trusted Testimony accuracy: 0.8
False Belief accuracy: 0.8
True Belief accuracy: 0.85
Late Label accuracy: 0.8
Uninformative Label accuracy: 0.8421052631578947
Transparent Access accuracy: 0.75
Date: 2024-03-16-04-36-22
Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.7818181818181819
Trusted Testimony accuracy: 0.85
False Belief accuracy: 0.85
True Belief accuracy: 0.75
Late Label accuracy: 0.7333333333333333
Uninformative Label accuracy: 0.7894736842105263
Transparent Access accuracy: 0.6875
Date: 2024-03-16-08-43-06
Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.7818181818181819
Trusted Testimony accuracy: 0.8
False Belief accuracy: 0.75
True Belief accuracy: 0.85
Late Label accuracy: 0.7333333333333333
Uninformative Label accuracy: 0.7368421052631579
Transparent Access accuracy: 0.8125
Date: 2024-03-16-13-29-50
Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.8090909090909091
Trusted Testimony accuracy: 0.8
False Belief accuracy: 0.8
True Belief accuracy: 0.8
Late Label accuracy: 0.8666666666666667
Uninformative Label accuracy: 0.7894736842105263
Transparent Access accuracy: 0.8125
Date: 2024-04-06-14-16-48
Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.7818181818181819
Trusted Testimony accuracy: 0.8
False Belief accuracy: 0.75
True Belief accuracy: 0.85
Late Label accuracy: 0.7333333333333333
Uninformative Label accuracy: 0.7368421052631579
Transparent Access accuracy: 0.8125
Date: 2024-04-06-14-20-59
Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.8
Trusted Testimony accuracy: 0.8
False Belief accuracy: 0.75
True Belief accuracy: 0.85
Late Label accuracy: 0.8
Uninformative Label accuracy: 0.8421052631578947
Transparent Access accuracy: 0.75
Date: 2024-04-06-14-28-42
Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.7636363636363637
Trusted Testimony accuracy: 0.8
False Belief accuracy: 0.8
True Belief accuracy: 0.75
Late Label accuracy: 0.7333333333333333
Uninformative Label accuracy: 0.7368421052631579
Transparent Access accuracy: 0.75
Date: 2024-04-06-14-39-16
Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.8
Trusted Testimony accuracy: 0.85
False Belief accuracy: 0.8
True Belief accuracy: 0.8
Late Label accuracy: 0.7333333333333333
Uninformative Label accuracy: 0.7368421052631579
Transparent Access accuracy: 0.875
Date: 2024-04-06-14-51-24
Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.8
Trusted Testimony accuracy: 0.85
False Belief accuracy: 0.8
True Belief accuracy: 0.8
Late Label accuracy: 0.8666666666666667
Uninformative Label accuracy: 0.7368421052631579
Transparent Access accuracy: 0.75
Date: 2024-04-06-15-26-35
Model: gpt-3.5-azure-chat
Test on 100 samples :
All Accuracy: 0.76
Trusted Testimony accuracy: 0.8947368421052632
False Belief accuracy: 0.7647058823529411
True Belief accuracy: 0.7894736842105263
Late Label accuracy: 0.6428571428571429
Uninformative Label accuracy: 0.75
Transparent Access accuracy: 0.6666666666666666
Date: 2024-04-06-15-30-32
Model: gpt-3.5-azure-chat
Test on 100 samples :
All Accuracy: 0.78
Trusted Testimony accuracy: 0.9473684210526315
False Belief accuracy: 0.7647058823529411
True Belief accuracy: 0.7894736842105263
Late Label accuracy: 0.5714285714285714
Uninformative Label accuracy: 0.875
Transparent Access accuracy: 0.6666666666666666
Date: 2024-04-06-15-38-03
Model: gpt-3.5-azure-chat
Test on 100 samples :
All Accuracy: 0.75
Trusted Testimony accuracy: 0.8947368421052632
False Belief accuracy: 0.7647058823529411
True Belief accuracy: 0.7894736842105263
Late Label accuracy: 0.5714285714285714
Uninformative Label accuracy: 0.8125
Transparent Access accuracy: 0.6
Date: 2024-04-06-15-48-01
Model: gpt-3.5-azure-chat
Test on 100 samples :
All Accuracy: 0.74
Trusted Testimony accuracy: 0.9473684210526315
False Belief accuracy: 0.8235294117647058
True Belief accuracy: 0.7894736842105263
Late Label accuracy: 0.5714285714285714
Uninformative Label accuracy: 0.6875
Transparent Access accuracy: 0.5333333333333333
Date: 2024-04-06-15-58-57
Model: gpt-3.5-azure-chat
Test on 100 samples :
All Accuracy: 0.75
Trusted Testimony accuracy: 0.8947368421052632
False Belief accuracy: 0.7647058823529411
True Belief accuracy: 0.7894736842105263
Late Label accuracy: 0.7142857142857143
Uninformative Label accuracy: 0.6875
Transparent Access accuracy: 0.6
Date: 2024-04-07-13-04-55
Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.7454545454545455
Trusted Testimony accuracy: 0.9
False Belief accuracy: 0.65
True Belief accuracy: 0.8
Late Label accuracy: 0.6
Uninformative Label accuracy: 0.7894736842105263
Transparent Access accuracy: 0.6875
Date: 2024-04-07-13-09-42
Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.7909090909090909
Trusted Testimony accuracy: 0.95
False Belief accuracy: 0.75
True Belief accuracy: 0.85
Late Label accuracy: 0.4666666666666667
Uninformative Label accuracy: 0.8947368421052632
Transparent Access accuracy: 0.75
Date: 2024-04-07-13-18-25
Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.7545454545454545
Trusted Testimony accuracy: 0.9
False Belief accuracy: 0.65
True Belief accuracy: 0.8
Late Label accuracy: 0.6
Uninformative Label accuracy: 0.8421052631578947
Transparent Access accuracy: 0.6875
Date: 2024-04-07-13-29-29
Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.7272727272727273
Trusted Testimony accuracy: 0.95
False Belief accuracy: 0.8
True Belief accuracy: 0.75
Late Label accuracy: 0.5333333333333333
Uninformative Label accuracy: 0.6842105263157895
Transparent Access accuracy: 0.5625
Date: 2024-04-07-13-41-32
Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.7363636363636363
Trusted Testimony accuracy: 0.9
False Belief accuracy: 0.75
True Belief accuracy: 0.8
Late Label accuracy: 0.6
Uninformative Label accuracy: 0.631578947368421
Transparent Access accuracy: 0.6875
Date: 2024-04-13-15-53-53
Output type: multipleModel: gpt-3.5-azure-chat
Test on 110 samples : (1 split)
All Accuracy: 0.9181818181818182
Trusted Testimony accuracy: 1.0
False Belief accuracy: 0.85
True Belief accuracy: 0.85
Late Label accuracy: 0.8666666666666667
Uninformative Label accuracy: 1.0
Transparent Access accuracy: 0.9375

Date: 2024-04-13-16-01-14
Output type: multipleModel: gpt-3.5-azure-chat
Test on 110 samples : (2 split)
All Accuracy: 0.9
Trusted Testimony accuracy: 1.0
False Belief accuracy: 0.85
True Belief accuracy: 0.85
Late Label accuracy: 0.8666666666666667
Uninformative Label accuracy: 0.9473684210526315
Transparent Access accuracy: 0.875

Date: 2024-04-13-16-09-18
Output type: multipleModel: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.8636363636363636
Trusted Testimony accuracy: 0.95
False Belief accuracy: 0.9
True Belief accuracy: 0.75
Late Label accuracy: 0.6666666666666666
Uninformative Label accuracy: 1.0
Transparent Access accuracy: 0.875

Date: 2024-04-13-16-20-06
Output type: multipleModel: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.8181818181818182
Trusted Testimony accuracy: 0.9
False Belief accuracy: 0.9
True Belief accuracy: 0.8
Late Label accuracy: 0.6
Uninformative Label accuracy: 0.8947368421052632
Transparent Access accuracy: 0.75

Date: 2024-04-13-16-35-28
Output type: multipleModel: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.8454545454545455
Trusted Testimony accuracy: 0.95
False Belief accuracy: 0.85
True Belief accuracy: 0.85
Late Label accuracy: 0.8
Uninformative Label accuracy: 0.8421052631578947
Transparent Access accuracy: 0.75

Date: 2024-04-26-08-23-26
Output type: multipleSplit-value: 1Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.6636363636363637
Trusted Testimony accuracy: 0.8
False Belief accuracy: 0.5
True Belief accuracy: 0.8
Late Label accuracy: 0.5333333333333333
Uninformative Label accuracy: 0.5263157894736842
Transparent Access accuracy: 0.8125

Date: 2024-04-26-08-30-06
Output type: multipleSplit-value: 2Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.5818181818181818
Trusted Testimony accuracy: 0.7
False Belief accuracy: 0.6
True Belief accuracy: 0.6
Late Label accuracy: 0.5333333333333333
Uninformative Label accuracy: 0.5263157894736842
Transparent Access accuracy: 0.5

Date: 2024-04-26-08-38-38
Output type: multipleSplit-value: 3Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.6818181818181818
Trusted Testimony accuracy: 0.7
False Belief accuracy: 0.75
True Belief accuracy: 0.7
Late Label accuracy: 0.6
Uninformative Label accuracy: 0.7368421052631579
Transparent Access accuracy: 0.5625

Date: 2024-04-26-08-49-18
Output type: multipleSplit-value: 4Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.7
Trusted Testimony accuracy: 0.6
False Belief accuracy: 0.8
True Belief accuracy: 0.65
Late Label accuracy: 0.6
Uninformative Label accuracy: 0.7894736842105263
Transparent Access accuracy: 0.75

Date: 2024-04-26-09-01-57
Output type: multipleSplit-value: 5Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.7
Trusted Testimony accuracy: 0.7
False Belief accuracy: 0.8
True Belief accuracy: 0.65
Late Label accuracy: 0.6
Uninformative Label accuracy: 0.7368421052631579
Transparent Access accuracy: 0.6875

Date: 2024-04-27-10-01-22
Output type: multipleSplit-value: 1Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.7818181818181819
Trusted Testimony accuracy: 0.85
False Belief accuracy: 0.65
True Belief accuracy: 0.8
Late Label accuracy: 0.8
Uninformative Label accuracy: 0.7894736842105263
Transparent Access accuracy: 0.8125

Date: 2024-04-27-10-05-20
Output type: multipleSplit-value: 2Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.7909090909090909
Trusted Testimony accuracy: 0.95
False Belief accuracy: 0.65
True Belief accuracy: 0.85
Late Label accuracy: 0.6666666666666666
Uninformative Label accuracy: 0.8421052631578947
Transparent Access accuracy: 0.75

Date: 2024-04-27-10-11-28
Output type: multipleSplit-value: 3Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.7545454545454545
Trusted Testimony accuracy: 0.9
False Belief accuracy: 0.75
True Belief accuracy: 0.8
Late Label accuracy: 0.6666666666666666
Uninformative Label accuracy: 0.8421052631578947
Transparent Access accuracy: 0.5

Date: 2024-04-27-10-18-43
Output type: multipleSplit-value: 4Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.8363636363636363
Trusted Testimony accuracy: 0.95
False Belief accuracy: 0.9
True Belief accuracy: 0.9
Late Label accuracy: 0.6666666666666666
Uninformative Label accuracy: 0.8421052631578947
Transparent Access accuracy: 0.6875

Date: 2024-04-27-10-28-48
Output type: multipleSplit-value: 5Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.8181818181818182
Trusted Testimony accuracy: 0.9
False Belief accuracy: 0.8
True Belief accuracy: 0.9
Late Label accuracy: 0.7333333333333333
Uninformative Label accuracy: 0.8421052631578947
Transparent Access accuracy: 0.6875

Date: 2024-04-29-12-07-11
Output type: multipleSplit-value: 1Model: mixtral-7B-chat
Test on 110 samples :
All Accuracy: 0.15454545454545454
Trusted Testimony accuracy: 0.1
False Belief accuracy: 0.1
True Belief accuracy: 0.2
Late Label accuracy: 0.3333333333333333
Uninformative Label accuracy: 0.10526315789473684
Transparent Access accuracy: 0.125

Date: 2024-04-29-12-41-27
Output type: multipleSplit-value: 1Model: mixtral-7B-chat
Test on 110 samples :
All Accuracy: 0.6636363636363637
Trusted Testimony accuracy: 0.95
False Belief accuracy: 0.4
True Belief accuracy: 0.8
Late Label accuracy: 0.5333333333333333
Uninformative Label accuracy: 0.5789473684210527
Transparent Access accuracy: 0.6875

Date: 2024-04-29-16-24-16
Output type: multipleSplit-value: 2Model: mixtral-7B-chat
Test on 110 samples :
All Accuracy: 0.6545454545454545
Trusted Testimony accuracy: 0.8
False Belief accuracy: 0.3
True Belief accuracy: 0.9
Late Label accuracy: 0.6
Uninformative Label accuracy: 0.6842105263157895
Transparent Access accuracy: 0.625

Date: 2024-04-29-22-58-34
Output type: multipleSplit-value: 3Model: mixtral-7B-chat
Test on 110 samples :
All Accuracy: 0.6818181818181818
Trusted Testimony accuracy: 0.8
False Belief accuracy: 0.25
True Belief accuracy: 1.0
Late Label accuracy: 0.6
Uninformative Label accuracy: 0.6842105263157895
Transparent Access accuracy: 0.75

Date: 2024-04-30-02-40-03
Output type: multipleSplit-value: 4Model: mixtral-7B-chat
Test on 110 samples :
All Accuracy: 0.6909090909090909
Trusted Testimony accuracy: 0.8
False Belief accuracy: 0.6
True Belief accuracy: 0.8
Late Label accuracy: 0.7333333333333333
Uninformative Label accuracy: 0.5263157894736842
Transparent Access accuracy: 0.6875

Date: 2024-04-30-15-56-46
Output type: multipleSplit-value: 5Model: mixtral-7B-chat
Test on 110 samples :
All Accuracy: 0.8181818181818182
Trusted Testimony accuracy: 0.9
False Belief accuracy: 0.55
True Belief accuracy: 1.0
Late Label accuracy: 0.6666666666666666
Uninformative Label accuracy: 0.8421052631578947
Transparent Access accuracy: 0.9375

Date: 2024-05-11-16-45-17
Output type: multipleSplit-value: 1Model: llama-3-8B-groq-chat
Test on 110 samples :
All Accuracy: 0.8
Trusted Testimony accuracy: 0.9
False Belief accuracy: 0.85
True Belief accuracy: 0.7
Late Label accuracy: 0.8
Uninformative Label accuracy: 0.8421052631578947
Transparent Access accuracy: 0.6875

Date: 2024-05-11-23-07-23
Output type: multipleSplit-value: 2Model: llama-3-8B-groq-chat
Test on 110 samples :
All Accuracy: 0.8272727272727273
Trusted Testimony accuracy: 0.9
False Belief accuracy: 0.7
True Belief accuracy: 0.85
Late Label accuracy: 0.6666666666666666
Uninformative Label accuracy: 0.8421052631578947
Transparent Access accuracy: 1.0

Date: 2024-05-13-20-47-52
Output type: multipleSplit-value: 1Model: llama-3-70B-chat
Test on 110 samples :
All Accuracy: 0.7272727272727273
Trusted Testimony accuracy: 0.85
False Belief accuracy: 0.6
True Belief accuracy: 0.8
Late Label accuracy: 0.8
Uninformative Label accuracy: 0.7368421052631579
Transparent Access accuracy: 0.5625

Date: 2024-05-14-18-47-49
Output type: multipleSplit-value: 2Model: llama-3-70B-chat
Test on 110 samples :
All Accuracy: 0.7454545454545455
Trusted Testimony accuracy: 0.75
False Belief accuracy: 0.65
True Belief accuracy: 0.95
Late Label accuracy: 0.9333333333333333
Uninformative Label accuracy: 0.631578947368421
Transparent Access accuracy: 0.5625

Date: 2024-05-17-19-15-02
Output type: multipleSplit-value: 1Model: mixtral-7B-chat
Test on 110 samples :
All Accuracy: 0.5909090909090909
Trusted Testimony accuracy: 0.8
False Belief accuracy: 0.5
True Belief accuracy: 0.45
Late Label accuracy: 0.8
Uninformative Label accuracy: 0.47368421052631576
Transparent Access accuracy: 0.5625

Date: 2024-05-17-19-37-31
Output type: multipleSplit-value: 2Model: mixtral-7B-chat
Test on 110 samples :
All Accuracy: 0.6090909090909091
Trusted Testimony accuracy: 0.9
False Belief accuracy: 0.3
True Belief accuracy: 0.65
Late Label accuracy: 0.6666666666666666
Uninformative Label accuracy: 0.5789473684210527
Transparent Access accuracy: 0.5625

Date: 2024-05-17-20-00-55
Output type: multipleSplit-value: 3Model: mixtral-7B-chat
Test on 110 samples :
All Accuracy: 0.5909090909090909
Trusted Testimony accuracy: 0.9
False Belief accuracy: 0.4
True Belief accuracy: 0.6
Late Label accuracy: 0.7333333333333333
Uninformative Label accuracy: 0.47368421052631576
Transparent Access accuracy: 0.4375

Date: 2024-05-17-20-26-00
Output type: multipleSplit-value: 4Model: mixtral-7B-chat
Test on 110 samples :
All Accuracy: 0.6909090909090909
Trusted Testimony accuracy: 0.9
False Belief accuracy: 0.5
True Belief accuracy: 0.9
Late Label accuracy: 0.6666666666666666
Uninformative Label accuracy: 0.5789473684210527
Transparent Access accuracy: 0.5625

Date: 2024-05-17-20-46-57
Output type: multipleSplit-value: 5Model: mixtral-7B-chat
Test on 110 samples :
All Accuracy: 0.5727272727272728
Trusted Testimony accuracy: 0.75
False Belief accuracy: 0.4
True Belief accuracy: 0.75
Late Label accuracy: 0.4666666666666667
Uninformative Label accuracy: 0.47368421052631576
Transparent Access accuracy: 0.5625

Date: 2024-05-18-04-08-20
Output type: multipleSplit-value: 3Model: llama-3-8B-groq-chat
Test on 110 samples :
All Accuracy: 0.5818181818181818
Trusted Testimony accuracy: 0.6
False Belief accuracy: 0.4
True Belief accuracy: 0.75
Late Label accuracy: 0.7333333333333333
Uninformative Label accuracy: 0.5789473684210527
Transparent Access accuracy: 0.4375

Date: 2024-05-18-18-27-49
Output type: multipleSplit-value: 1Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.7727272727272727
Trusted Testimony accuracy: 0.95
False Belief accuracy: 0.7
True Belief accuracy: 0.7
Late Label accuracy: 0.8666666666666667
Uninformative Label accuracy: 0.7368421052631579
Transparent Access accuracy: 0.6875

Date: 2024-05-18-18-32-06
Output type: multipleSplit-value: 2Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.7909090909090909
Trusted Testimony accuracy: 0.95
False Belief accuracy: 0.75
True Belief accuracy: 0.65
Late Label accuracy: 0.9333333333333333
Uninformative Label accuracy: 0.7894736842105263
Transparent Access accuracy: 0.6875

Date: 2024-05-18-18-38-10
Output type: multipleSplit-value: 3Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.7909090909090909
Trusted Testimony accuracy: 0.9
False Belief accuracy: 0.8
True Belief accuracy: 0.85
Late Label accuracy: 0.8
Uninformative Label accuracy: 0.7368421052631579
Transparent Access accuracy: 0.625

Date: 2024-05-18-18-46-19
Output type: multipleSplit-value: 4Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.8363636363636363
Trusted Testimony accuracy: 0.95
False Belief accuracy: 0.85
True Belief accuracy: 0.75
Late Label accuracy: 0.8666666666666667
Uninformative Label accuracy: 0.8421052631578947
Transparent Access accuracy: 0.75

Date: 2024-05-18-18-56-00
Output type: multipleSplit-value: 5Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.8
Trusted Testimony accuracy: 0.95
False Belief accuracy: 0.8
True Belief accuracy: 0.75
Late Label accuracy: 0.8666666666666667
Uninformative Label accuracy: 0.7894736842105263
Transparent Access accuracy: 0.625

Date: 2024-05-19-00-30-07
Output type: multipleSplit-value: 4Model: llama-3-8B-groq-chat
Test on 110 samples :
All Accuracy: 0.5909090909090909
Trusted Testimony accuracy: 0.7
False Belief accuracy: 0.55
True Belief accuracy: 0.55
Late Label accuracy: 0.4666666666666667
Uninformative Label accuracy: 0.6842105263157895
Transparent Access accuracy: 0.5625

Date: 2024-05-19-02-59-34
Output type: multipleSplit-value: 3Model: llama-3-70B-chat
Test on 110 samples :
All Accuracy: 0.4090909090909091
Trusted Testimony accuracy: 0.8
False Belief accuracy: 0.1
True Belief accuracy: 0.65
Late Label accuracy: 0.3333333333333333
Uninformative Label accuracy: 0.3684210526315789
Transparent Access accuracy: 0.125

Date: 2024-05-20-09-34-23
Output type: multipleSplit-value: 3Model: llama-3-70B-chat
Test on 110 samples :
All Accuracy: 0.4
Trusted Testimony accuracy: 0.75
False Belief accuracy: 0.05
True Belief accuracy: 0.6
Late Label accuracy: 0.5333333333333333
Uninformative Label accuracy: 0.3157894736842105
Transparent Access accuracy: 0.125

Date: 2024-05-20-10-59-46
Output type: multipleSplit-value: 2Model: llama-3-8B-groq-chat
Test on 110 samples :
All Accuracy: 0.5727272727272728
Trusted Testimony accuracy: 0.8
False Belief accuracy: 0.45
True Belief accuracy: 0.5
Late Label accuracy: 0.7333333333333333
Uninformative Label accuracy: 0.5263157894736842
Transparent Access accuracy: 0.4375

Date: 2024-05-20-20-13-37
Output type: multipleSplit-value: 2Model: llama-3-8B-groq-chat
Test on 110 samples :
All Accuracy: 0.5454545454545454
Trusted Testimony accuracy: 0.85
False Belief accuracy: 0.55
True Belief accuracy: 0.45
Late Label accuracy: 0.6
Uninformative Label accuracy: 0.3157894736842105
Transparent Access accuracy: 0.5

Date: 2024-05-21-03-08-03
Output type: multipleSplit-value: 5Model: llama-3-70B-chat
Test on 110 samples :
All Accuracy: 0.41818181818181815
Trusted Testimony accuracy: 0.75
False Belief accuracy: 0.1
True Belief accuracy: 0.55
Late Label accuracy: 0.4666666666666667
Uninformative Label accuracy: 0.3684210526315789
Transparent Access accuracy: 0.25

Date: 2024-05-21-16-05-17
Output type: multipleSplit-value: 5Model: llama-3-8B-groq-chat
Test on 110 samples :
All Accuracy: 0.6727272727272727
Trusted Testimony accuracy: 0.65
False Belief accuracy: 0.6
True Belief accuracy: 0.75
Late Label accuracy: 0.8
Uninformative Label accuracy: 0.631578947368421
Transparent Access accuracy: 0.625

Date: 2024-05-22-08-14-19
Output type: multipleSplit-value: 1Model: llama-3-70B-chat
Test on 110 samples :
All Accuracy: 0.39090909090909093
Trusted Testimony accuracy: 0.75
False Belief accuracy: 0.1
True Belief accuracy: 0.5
Late Label accuracy: 0.4
Uninformative Label accuracy: 0.3157894736842105
Transparent Access accuracy: 0.25

Date: 2024-05-22-08-29-05
Output type: multipleSplit-value: 2Model: llama-3-70B-chat
Test on 110 samples :
All Accuracy: 0.34545454545454546
Trusted Testimony accuracy: 0.7
False Belief accuracy: 0.05
True Belief accuracy: 0.45
Late Label accuracy: 0.4666666666666667
Uninformative Label accuracy: 0.2631578947368421
Transparent Access accuracy: 0.125

Date: 2024-05-23-07-49-21
Output type: multipleSplit-value: 3Model: llama-3-8B-groq-chat
Test on 110 samples :
All Accuracy: 0.6272727272727273
Trusted Testimony accuracy: 0.85
False Belief accuracy: 0.5
True Belief accuracy: 0.7
Late Label accuracy: 0.6666666666666666
Uninformative Label accuracy: 0.5263157894736842
Transparent Access accuracy: 0.5

Date: 2024-05-23-08-30-49
Output type: multipleSplit-value: 3Model: llama-3-70B-chat
Test on 110 samples :
All Accuracy: 0.42727272727272725
Trusted Testimony accuracy: 0.8
False Belief accuracy: 0.1
True Belief accuracy: 0.6
Late Label accuracy: 0.5333333333333333
Uninformative Label accuracy: 0.3684210526315789
Transparent Access accuracy: 0.125

Date: 2024-05-23-20-04-55
Output type: multipleSplit-value: 4Model: llama-3-8B-groq-chat
Test on 110 samples :
All Accuracy: 0.6090909090909091
Trusted Testimony accuracy: 0.7
False Belief accuracy: 0.65
True Belief accuracy: 0.55
Late Label accuracy: 0.6
Uninformative Label accuracy: 0.631578947368421
Transparent Access accuracy: 0.5

Date: 2024-05-25-03-29-09
Output type: multipleSplit-value: 1Model: llama-3-70B-chat
Test on 110 samples :
All Accuracy: 0.38181818181818183
Trusted Testimony accuracy: 0.7
False Belief accuracy: 0.1
True Belief accuracy: 0.45
Late Label accuracy: 0.5333333333333333
Uninformative Label accuracy: 0.3157894736842105
Transparent Access accuracy: 0.1875

Date: 2024-05-25-04-12-30
Output type: multipleSplit-value: 2Model: llama-3-70B-chat
Test on 110 samples :
All Accuracy: 0.35454545454545455
Trusted Testimony accuracy: 0.8
False Belief accuracy: 0.05
True Belief accuracy: 0.5
Late Label accuracy: 0.4
Uninformative Label accuracy: 0.2631578947368421
Transparent Access accuracy: 0.0625

Date: 2024-06-06-14-01-46
Output type: multipleSplit-value: 1Model: gpt-4-azure-chat
Test on 110 samples :
All Accuracy: 0.42727272727272725
Trusted Testimony accuracy: 0.8
False Belief accuracy: 0.0
True Belief accuracy: 0.65
Late Label accuracy: 0.6
Uninformative Label accuracy: 0.42105263157894735
Transparent Access accuracy: 0.0625

Date: 2024-06-07-06-25-15
Output type: multipleSplit-value: 2Model: gpt-4-azure-chat
Test on 110 samples :
All Accuracy: 0.44545454545454544
Trusted Testimony accuracy: 0.85
False Belief accuracy: 0.0
True Belief accuracy: 0.75
Late Label accuracy: 0.5333333333333333
Uninformative Label accuracy: 0.42105263157894735
Transparent Access accuracy: 0.0625

Date: 2024-06-08-06-15-33
Output type: multipleSplit-value: 3Model: gpt-4-azure-chat
Test on 110 samples :
All Accuracy: 0.43636363636363634
Trusted Testimony accuracy: 0.85
False Belief accuracy: 0.0
True Belief accuracy: 0.7
Late Label accuracy: 0.6
Uninformative Label accuracy: 0.3684210526315789
Transparent Access accuracy: 0.0625

Date: 2024-06-08-11-00-24
Output type: multipleSplit-value: 1Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.7727272727272727
Trusted Testimony accuracy: 0.95
False Belief accuracy: 0.7
True Belief accuracy: 0.7
Late Label accuracy: 0.8666666666666667
Uninformative Label accuracy: 0.7368421052631579
Transparent Access accuracy: 0.6875

Date: 2024-06-08-11-04-30
Output type: multipleSplit-value: 2Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.7909090909090909
Trusted Testimony accuracy: 0.95
False Belief accuracy: 0.75
True Belief accuracy: 0.65
Late Label accuracy: 0.9333333333333333
Uninformative Label accuracy: 0.7894736842105263
Transparent Access accuracy: 0.6875

Date: 2024-06-08-11-10-22
Output type: multipleSplit-value: 3Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.8
Trusted Testimony accuracy: 0.9
False Belief accuracy: 0.75
True Belief accuracy: 0.85
Late Label accuracy: 0.8
Uninformative Label accuracy: 0.7894736842105263
Transparent Access accuracy: 0.6875

Date: 2024-06-08-11-18-12
Output type: multipleSplit-value: 4Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.7909090909090909
Trusted Testimony accuracy: 0.95
False Belief accuracy: 0.8
True Belief accuracy: 0.7
Late Label accuracy: 0.8666666666666667
Uninformative Label accuracy: 0.7368421052631579
Transparent Access accuracy: 0.6875

Date: 2024-06-08-11-27-41
Output type: multipleSplit-value: 5Model: gpt-3.5-azure-chat
Test on 110 samples :
All Accuracy: 0.8
Trusted Testimony accuracy: 0.9
False Belief accuracy: 0.8
True Belief accuracy: 0.7
Late Label accuracy: 0.8666666666666667
Uninformative Label accuracy: 0.8421052631578947
Transparent Access accuracy: 0.6875

Date: 2024-06-09-19-30-45
Output type: multipleSplit-value: 4Model: gpt-4-azure-chat
Test on 110 samples :
All Accuracy: 0.5
Trusted Testimony accuracy: 0.85
False Belief accuracy: 0.1
True Belief accuracy: 0.8
Late Label accuracy: 0.6666666666666666
Uninformative Label accuracy: 0.47368421052631576
Transparent Access accuracy: 0.0625

