Lessons from a User Experience Evaluation of NLP Interfaces

Eduardo Calò, Lydia Penkert, Saad Mahamood


Abstract
Human evaluations lay at the heart of evaluations within the field of Natural Language Processing (NLP). Seen as the “golden standard” of evaluations, questions are being asked on whether these evaluations are both reproducible and repeatable. One overlooked aspect is the design choices made by researchers when designing user interfaces (UIs). In this paper, four UIs used in past NLP human evaluations are assessed by UX experts, based on standardized human-centered interaction principles. Building on these insights, we derive several recommendations that the NLP community should apply when designing UIs, to enable more consistent human evaluation responses.
Anthology ID:
2025.findings-naacl.159
Volume:
Findings of the Association for Computational Linguistics: NAACL 2025
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2915–2929
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.159/
DOI:
Bibkey:
Cite (ACL):
Eduardo Calò, Lydia Penkert, and Saad Mahamood. 2025. Lessons from a User Experience Evaluation of NLP Interfaces. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 2915–2929, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Lessons from a User Experience Evaluation of NLP Interfaces (Calò et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.159.pdf