Date: 2024-02-23-15-49-32
Model: gpt-3.5-azure
Test on 110 samples :
All Accuracy: 0.8363636363636363
Trusted Testimony accuracy: 0.85
False Belief accuracy: 0.85
True Belief accuracy: 0.8
Late Label accuracy: 0.7333333333333333
Uninformative Label accuracy: 0.8947368421052632
Transparent Access accuracy: 0.875
Date: 2024-04-06-14-14-31
Model: gpt-3.5-azure
Test on 110 samples :
All Accuracy: 0.8363636363636363
Trusted Testimony accuracy: 0.8
False Belief accuracy: 0.85
True Belief accuracy: 0.8
Late Label accuracy: 0.7333333333333333
Uninformative Label accuracy: 0.8947368421052632
Transparent Access accuracy: 0.9375
Date: 2024-04-06-15-24-38
Model: gpt-3.5-azure
Test on 100 samples :
All Accuracy: 0.74
Trusted Testimony accuracy: 0.9473684210526315
False Belief accuracy: 0.6470588235294118
True Belief accuracy: 0.8947368421052632
Late Label accuracy: 0.5714285714285714
Uninformative Label accuracy: 0.75
Transparent Access accuracy: 0.5333333333333333
Date: 2024-04-07-13-02-37
Model: gpt-3.5-azure
Test on 110 samples :
All Accuracy: 0.7181818181818181
Trusted Testimony accuracy: 0.9
False Belief accuracy: 0.6
True Belief accuracy: 0.9
Late Label accuracy: 0.5333333333333333
Uninformative Label accuracy: 0.7368421052631579
Transparent Access accuracy: 0.5625
Date: 2024-04-13-15-51-44
Output type: multipleModel: gpt-3.5-azure
Test on 110 samples :
All Accuracy: 0.7
Trusted Testimony accuracy: 0.9
False Belief accuracy: 0.65
True Belief accuracy: 0.8
Late Label accuracy: 0.5333333333333333
Uninformative Label accuracy: 0.7368421052631579
Transparent Access accuracy: 0.5

Date: 2024-04-26-08-10-12
Output type: multipleSplit-value: 1Model: gpt-3.5-azure
Test on 110 samples :
All Accuracy: 0.6272727272727273
Trusted Testimony accuracy: 0.85
False Belief accuracy: 0.45
True Belief accuracy: 0.7
Late Label accuracy: 0.5333333333333333
Uninformative Label accuracy: 0.6842105263157895
Transparent Access accuracy: 0.5

Date: 2024-04-27-09-49-46
Output type: multipleSplit-value: 1Model: gpt-3.5-azure
Test on 110 samples :
All Accuracy: 0.7545454545454545
Trusted Testimony accuracy: 0.95
False Belief accuracy: 0.65
True Belief accuracy: 0.95
Late Label accuracy: 0.6666666666666666
Uninformative Label accuracy: 0.6842105263157895
Transparent Access accuracy: 0.5625

Date: 2024-04-28-02-56-47
Output type: multipleSplit-value: 1Model: mixtral-7B
Test on 110 samples :
All Accuracy: 0.34545454545454546
Trusted Testimony accuracy: 0.4
False Belief accuracy: 0.3
True Belief accuracy: 0.25
Late Label accuracy: 0.4
Uninformative Label accuracy: 0.42105263157894735
Transparent Access accuracy: 0.3125

Date: 2024-05-03-15-21-53
Output type: multipleSplit-value: 1Model: llama-3-70B
Test on 110 samples :
All Accuracy: 0.7
Trusted Testimony accuracy: 0.95
False Belief accuracy: 0.55
True Belief accuracy: 0.7
Late Label accuracy: 0.6
Uninformative Label accuracy: 0.7894736842105263
Transparent Access accuracy: 0.5625

Date: 2024-05-07-00-02-20
Output type: multipleSplit-value: 1Model: llama-3-8B-groq
Test on 110 samples :
All Accuracy: 0.5181818181818182
Trusted Testimony accuracy: 0.55
False Belief accuracy: 0.55
True Belief accuracy: 0.75
Late Label accuracy: 0.26666666666666666
Uninformative Label accuracy: 0.47368421052631576
Transparent Access accuracy: 0.4375

Date: 2024-05-17-18-33-29
Output type: multipleSplit-value: 1Model: mixtral-7B
Test on 110 samples :
All Accuracy: 0.7090909090909091
Trusted Testimony accuracy: 1.0
False Belief accuracy: 0.5
True Belief accuracy: 0.6
Late Label accuracy: 0.8666666666666667
Uninformative Label accuracy: 0.7368421052631579
Transparent Access accuracy: 0.5625

Date: 2024-05-18-18-16-40
Output type: multipleSplit-value: 1Model: gpt-3.5-azure
Test on 110 samples :
All Accuracy: 0.7090909090909091
Trusted Testimony accuracy: 0.85
False Belief accuracy: 0.65
True Belief accuracy: 0.65
Late Label accuracy: 0.7333333333333333
Uninformative Label accuracy: 0.7368421052631579
Transparent Access accuracy: 0.625

Date: 2024-05-20-10-22-27
Output type: multipleSplit-value: 1Model: llama-3-8B-groq
Test on 110 samples :
All Accuracy: 0.4727272727272727
Trusted Testimony accuracy: 0.5
False Belief accuracy: 0.45
True Belief accuracy: 0.5
Late Label accuracy: 0.4666666666666667
Uninformative Label accuracy: 0.5263157894736842
Transparent Access accuracy: 0.375

Date: 2024-05-30-13-59-30
Output type: multipleSplit-value: 1Model: gpt-4-azure
Test on 110 samples :
All Accuracy: 0.44545454545454544
Trusted Testimony accuracy: 0.8
False Belief accuracy: 0.1
True Belief accuracy: 0.65
Late Label accuracy: 0.4666666666666667
Uninformative Label accuracy: 0.5263157894736842
Transparent Access accuracy: 0.0625

Date: 2024-06-08-11-35-00
Output type: multipleSplit-value: 5Model: gpt-3.5-azure
Test on 110 samples :
All Accuracy: 0.7181818181818181
Trusted Testimony accuracy: 0.85
False Belief accuracy: 0.65
True Belief accuracy: 0.65
Late Label accuracy: 0.7333333333333333
Uninformative Label accuracy: 0.7368421052631579
Transparent Access accuracy: 0.6875

