0000000000000000000000000000000000000000 001a4ba1d2fa1bc2154a6edd2e36b5375b7c949a Luca Benedetto <luca.benedetto93@gmail.com> 1699264111 +0000	branch: Created from HEAD
001a4ba1d2fa1bc2154a6edd2e36b5375b7c949a 138ac6080cd2bf1da7b2ea445954e1d74dd530bf Luca Benedetto <luca.benedetto93@gmail.com> 1699264246 +0000	commit: add tmp to gitignore
138ac6080cd2bf1da7b2ea445954e1d74dd530bf 7e4fc4442b41dca0814a75f18a28d9699653366d Luca Benedetto <luca.benedetto93@gmail.com> 1699265165 +0000	commit: add method to keep only questions answered by all role-played levels
7e4fc4442b41dca0814a75f18a28d9699653366d e39ea53b01cf2d8bb4a0ed0292d2b77508a2f05e Luca Benedetto <luca.benedetto93@gmail.com> 1699265830 +0000	commit: add numpy to requirements
e39ea53b01cf2d8bb4a0ed0292d2b77508a2f05e 33aafd8bcb941e9865cf277679e5a26dc2dc314a Luca Benedetto <luca.benedetto93@gmail.com> 1699267752 +0000	commit: add difficulty levels for the two datasets to the constants file
33aafd8bcb941e9865cf277679e5a26dc2dc314a 870345987898aa6be166563c56b053de327c1163 Luca Benedetto <luca.benedetto93@gmail.com> 1699267982 +0000	commit: add method get_original_dataset to get the whole test datasets
870345987898aa6be166563c56b053de327c1163 e10102d111e7bed64b2231236fa01960eda5c32f Luca Benedetto <luca.benedetto93@gmail.com> 1699268117 +0000	commit: add method to get dict that maps from qid to correct answer
e10102d111e7bed64b2231236fa01960eda5c32f 916bf5dc356f83624ecc44e6f63ef9eef3710785 Luca Benedetto <luca.benedetto93@gmail.com> 1699268239 +0000	commit: add method that returns dict that maps from qid to true difficulty
916bf5dc356f83624ecc44e6f63ef9eef3710785 f9fbad4126ff30523c71d5b052328e6a64ac1ee8 Luca Benedetto <luca.benedetto93@gmail.com> 1699268504 +0000	commit: add methods that returns the mapping from "true" difficulty ot the set of questions with that difficulty
f9fbad4126ff30523c71d5b052328e6a64ac1ee8 10f79ef43e786faee03227dc011cd5e7914afbf6 Luca Benedetto <luca.benedetto93@gmail.com> 1699269344 +0000	commit: add method that returns the avg accuracy per role-played level (both overall and per difficulty level)
10f79ef43e786faee03227dc011cd5e7914afbf6 8839ca4311ba306417edadc70f178040cb5a4e38 Luca Benedetto <luca.benedetto93@gmail.com> 1699269915 +0000	commit: add method get_response_correctness_per_model to utils
8839ca4311ba306417edadc70f178040cb5a4e38 5801297e9aeacf2b3b065841f29da312bfd94278 Luca Benedetto <luca.benedetto93@gmail.com> 1699270393 +0000	commit: add matplotlib to requirements
5801297e9aeacf2b3b065841f29da312bfd94278 9e930d9925e93bbf234ff109bbb324a3ae530903 Luca Benedetto <luca.benedetto93@gmail.com> 1699271650 +0000	commit: fix method get_response_correctness_per_model
9e930d9925e93bbf234ff109bbb324a3ae530903 b9b071a68c28cd989d0b0ae2027a5d3db770443c Luca Benedetto <luca.benedetto93@gmail.com> 1699271871 +0000	commit: add first version methods for plot_accuracy_per_model and plot_accuracy_per_difficulty_per_model
b9b071a68c28cd989d0b0ae2027a5d3db770443c b6cef248c0e8c583b840c65e329574b77d57841b Luca Benedetto <luca.benedetto93@gmail.com> 1699272342 +0000	commit: add formatting of plot for method plot_accuracy_per_model
b6cef248c0e8c583b840c65e329574b77d57841b bcf6b788f0753eeaae7856987533cdaebea5477e Luca Benedetto <luca.benedetto93@gmail.com> 1699272575 +0000	commit: clean formatting of method plot_accuracy_per_difficulty_per_model
bcf6b788f0753eeaae7856987533cdaebea5477e 75ce32679470f4bc868ccbdc9988af0ba8bb040a Luca Benedetto <luca.benedetto93@gmail.com> 1699273205 +0000	commit: add first version of the script to eval the LLM responses
75ce32679470f4bc868ccbdc9988af0ba8bb040a e953b609f857369731a25a44ec3456115662e1f4 Luca Benedetto <luca.benedetto93@gmail.com> 1699273758 +0000	commit: add method plot_accuracy_per_difficulty_for_different_role_played_levels
e953b609f857369731a25a44ec3456115662e1f4 931701ed8de492ad2338ce19819bbbf834410f81 Luca Benedetto <luca.benedetto93@gmail.com> 1699275056 +0000	commit: add seaborn to requirements
931701ed8de492ad2338ce19819bbbf834410f81 445c964becbbe75fc8c802dcb4589ad510e5490a Luca Benedetto <luca.benedetto93@gmail.com> 1699275255 +0000	commit: add plot to study correlation between QA accuracy and true difficulty
445c964becbbe75fc8c802dcb4589ad510e5490a 269476893960f566b5d58e25802767be3290a482 Luca Benedetto <luca.benedetto93@gmail.com> 1699275370 +0000	commit: change definition of folder_name
269476893960f566b5d58e25802767be3290a482 743a48d3bd03abf2cac32a721bc08d338e910bbf Luca Benedetto <luca.benedetto93@gmail.com> 1699275390 +0000	commit: add first complete version of the script to plot the analysis of the results
