Scalable and Culturally Specific Stereotype Dataset Construction via Human-LLM Collaboration

Weicheng Ma; John J. Guerrerio; Soroush Vosoughi

Scalable and Culturally Specific Stereotype Dataset Construction via Human-LLM Collaboration

Weicheng Ma, John J. Guerrerio, Soroush Vosoughi

Abstract

Research on stereotypes in large language models (LLMs) has largely focused on English-speaking contexts, due to the lack of datasets in other languages and the high cost of manual annotation in underrepresented cultures. To address this gap, we introduce a cost-efficient human-LLM collaborative annotation framework and apply it to construct EspanStereo, a Spanish-language stereotype dataset spanning multiple Spanish-speaking countries across Europe and Latin America. EspanStereo captures both well-documented stereotypes from prior literature and culturally specific biases absent from English-centric resources. Using LLMs to generate candidate stereotypes and in-culture annotators to validate them, we demonstrate the framework’s effectiveness in identifying nuanced, region-specific biases. Our evaluation of Spanish-supporting LLMs using EspanStereo reveals significant variation in stereotypical behavior across countries, highlighting the need for more culturally grounded assessments. Beyond Spanish, our framework is adaptable to other languages and regions, offering a scalable path toward multilingual stereotype benchmarks. This work broadens the scope of stereotype analysis in LLMs and lays the groundwork for comprehensive cross-cultural bias evaluation.

Anthology ID:: 2025.emnlp-main.1221
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 23939–23967
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1221/
DOI:
Bibkey:
Cite (ACL):: Weicheng Ma, John J. Guerrerio, and Soroush Vosoughi. 2025. Scalable and Culturally Specific Stereotype Dataset Construction via Human-LLM Collaboration. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 23939–23967, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Scalable and Culturally Specific Stereotype Dataset Construction via Human-LLM Collaboration (Ma et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1221.pdf
Checklist:: 2025.emnlp-main.1221.checklist.pdf

PDF Cite Search Checklist Fix data