Ian Apperly


2025

pdf bib
Automatic Scoring of an Open-Response Measure of Advanced Mind-Reading Using Large Language Models
Yixiao Wang | Russel Dsouza | Robert Lee | Ian Apperly | Rory Devine | Sanne van der Kleij | Mark Lee
Proceedings of the 10th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2025)

A rigorous psychometric approach is crucial for the accurate measurement of mind-reading abilities. Traditional scoring methods for such tests, which involve lengthy free-text responses, require considerable time and human effort. This study investigates the use of large language models (LLMs) to automate the scoring of psychometric tests. Data were collected from participants aged 13 to 30 years and scored by trained human coders to establish a benchmark. We evaluated multiple LLMs against human assessments, exploring various prompting strate- gies to optimize performance and fine-tuning the models using a subset of the collected data to enhance accuracy. Our results demonstrate that LLMs can assess advanced mind-reading abilities with over 90% accuracy on average. Notably, in most test items, the LLMs achieved higher Kappa agreement with the lead coder than two trained human coders, highlighting their potential to reliably score open-response psychometric tests.