*-PLUIE: Personalisable metric with Llm Used for Improved Evaluation

Quentin Lemesle, Leane Jourdan, Daisy Munson, Pierre Alain, Jonathan Chevelu, Arnaud Delhay, Damien Lolive


Abstract
Evaluating the quality of automatically generated text often relies on LLM-as-a-judge (LLM-judge) methods. While effective, these approaches are computationally expensive and require post-processing. To address these limitations, we build upon ParaPLUIE, a perplexity-based LLM-judge metric that estimates confidence over “Yes/No” answers without generating text. We introduce *-PLUIE, task-specific prompting variants of ParaPLUIE and evaluate their alignment with human judgement. Our experiments show that personalised *-PLUIE achieves stronger correlations with human ratings while maintaining low computational cost.
Anthology ID:
2026.starsem-conference.14
Volume:
Proceedings of the 15th Joint Conference on Lexical and Computational Semantics (*SEM 2026)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Saif M. Mohammad, Nedjma Ousidhoum
Venues:
*SEM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
211–243
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.starsem-conference.14/
DOI:
Bibkey:
Cite (ACL):
Quentin Lemesle, Leane Jourdan, Daisy Munson, Pierre Alain, Jonathan Chevelu, Arnaud Delhay, and Damien Lolive. 2026. *-PLUIE: Personalisable metric with Llm Used for Improved Evaluation. In Proceedings of the 15th Joint Conference on Lexical and Computational Semantics (*SEM 2026), pages 211–243, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
*-PLUIE: Personalisable metric with Llm Used for Improved Evaluation (Lemesle et al., *SEM 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.starsem-conference.14.pdf