Adaptive Data Collection for Latin-American Community-sourced Evaluation of Stereotypes (LACES)

Guido Ivetta, Pietro Palombini, Sof{\'\i}a Martinelli, Marcos J Gomez, M Emilia Echeveste, Sunipa Dev, Vinodkumar Prabhakaran, Luciana Benotti


Abstract
The evaluation of societal biases in NLP models is critically hindered by a geo-cultural gap. This leaves regions such as Latin America severely underserved, making it impossible to adequately assess or mitigate the perpetuation of harmful regional stereotypes in language technologies. This paper presents LACES, a stereotype association dataset, for 15 Latin American countries. This dataset includes 4,789 stereotype associations[The de-identified dataset can be accessed via GitHub], manually created and annotated by 83 participants. The dataset was developed through targeted community partnerships across Latin America. Additionally, in this paper, we propose a novel adaptive data collection methodology that uniquely integrates the sourcing of new stereotype entries and the validation of existing data within a single, unified workflow. This approach results in a resource with more unique stereotypes than previous static collection methods, enabling a more efficient stereotype collection. The paper further supports the quality of LACES by demonstrating reduced efficacy of debiasing methods on this dataset in comparison to existing popular stereotype benchmarks.Content Warning: This research involves the study of social biases. Consequently, the paper contains examples of discriminatory language and stereotypes that may be sensitive or upsetting to readers. These examples are included for the purpose of scientific analysis and do not reflect the views of the authors.
Anthology ID:
2026.findings-acl.203
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4177–4190
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.203/
DOI:
Bibkey:
Cite (ACL):
Guido Ivetta, Pietro Palombini, Sof{\'\i}a Martinelli, Marcos J Gomez, M Emilia Echeveste, Sunipa Dev, Vinodkumar Prabhakaran, and Luciana Benotti. 2026. Adaptive Data Collection for Latin-American Community-sourced Evaluation of Stereotypes (LACES). In Findings of the Association for Computational Linguistics: ACL 2026, pages 4177–4190, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Adaptive Data Collection for Latin-American Community-sourced Evaluation of Stereotypes (LACES) (Ivetta et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.203.pdf
Checklist:
 2026.findings-acl.203.checklist.pdf