Evaluation Drift in LLM Personality Induction: Are We Moving the Goalpost?

Prateek Kumar Rajput, Yewei Song, Iyiola Emmanuel Olatunji, Jacques Klein, Tegawendé Bissyande


Abstract
Can large language models reliably express a human-like personality, or are they merely mimicking surface cues without a stable underlying profile? We study this question on the long-form Essays Dataset, preferred over short, mood-driven text to target stable traits. Using a questionnaire-based (self-evaluation) test: IPIP-NEO, we ask: (i) does post-training (SFT, DPO, ORPO) stabilize questionnaire scores under prompt rephrasings, and (ii) can it induce target Big Five profiles from unguided essays? Across five models, fine-tuning consistently reduces variance in questionnaire responses, mitigating the fragility seen in pre-trained models. Yet accuracy on the full five-dimensional profile remains near chance even when single-trait scores improve, indicating that unguided essays lack the cues needed for faithful personality expression. We argue for scenario-grounded datasets or interactive elicitation that accumulates test-aligned evidence over time.
Anthology ID:
2026.lrec-main.881
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
11272–11285
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.881/
DOI:
Bibkey:
Cite (ACL):
Prateek Kumar Rajput, Yewei Song, Iyiola Emmanuel Olatunji, Jacques Klein, and Tegawendé Bissyande. 2026. Evaluation Drift in LLM Personality Induction: Are We Moving the Goalpost?. International Conference on Language Resources and Evaluation, main:11272–11285.
Cite (Informal):
Evaluation Drift in LLM Personality Induction: Are We Moving the Goalpost? (Rajput et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.881.pdf