Robust Estimation of Population-Level Effects in Repeated-Measures NLP Experimental Designs

Alejandro Benito-Santos; Adrian Ghajari; Víctor Fresno

Robust Estimation of Population-Level Effects in Repeated-Measures NLP Experimental Designs

Alejandro Benito-Santos, Adrian Ghajari, Víctor Fresno

Abstract

NLP research frequently grapples with multiple sources of variability—spanning runs, datasets, annotators, and more—yet conventional analysis methods often neglect these hierarchical structures, threatening the reproducibility of findings. To address this gap, we contribute a case study illustrating how linear mixed-effects models (LMMs) can rigorously capture systematic language-dependent differences (i.e., population-level effects) in a population of monolingual and multilingual language models. In the context of a bilingual hate speech detection task, we demonstrate that LMMs can uncover significant population-level effects—even under low-resource (small-N) experimental designs—while mitigating confounds and random noise. By setting out a transparent blueprint for repeated-measures experimentation, we encourage the NLP community to embrace variability as a feature, rather than a nuisance, in order to advance more robust, reproducible, and ultimately trustworthy results.

Anthology ID:: 2025.acl-long.1586
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 33076–33089
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1586/
DOI:
Bibkey:
Cite (ACL):: Alejandro Benito-Santos, Adrian Ghajari, and Víctor Fresno. 2025. Robust Estimation of Population-Level Effects in Repeated-Measures NLP Experimental Designs. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 33076–33089, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Robust Estimation of Population-Level Effects in Repeated-Measures NLP Experimental Designs (Benito-Santos et al., ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1586.pdf

PDF Cite Search Fix data