PBBQ: A Persian Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models

Farhan Farsi; Shayan Bali; Fatemeh Valeh; Parsa Ghofrani; Alireza Pakniat; Seyedkian Kashfipour; Amir H. Payberah

PBBQ: A Persian Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models

Farhan Farsi, Shayan Bali, Fatemeh Valeh, Parsa Ghofrani, Alireza Pakniat, Seyedkian Kashfipour, Amir H. Payberah

Abstract

With the increasing adoption of large language models (LLMs), ensuring their alignment with social norms has become a critical concern. While prior research has examined bias detection in various languages, there remains a significant gap in resources addressing social biases within Persian cultural contexts. In this work, we introduce PBBQ, a comprehensive benchmark dataset designed to evaluate social biases in Persian LLMs. Our benchmark, which encompasses 16 cultural categories, was developed through anonymous questionnaires completed by 250 diverse individuals across multiple demographics, in close collaboration with social science experts to ensure its validity. The resulting PBBQ dataset contains over 37,000 carefully curated questions, providing a foundation for the evaluation and mitigation of bias in Persian language models. We benchmark several open-source LLMs, a closed-source model, and Persian-specific fine-tuned models on PBBQ. Our findings reveal that current LLMs exhibit significant social biases across Persian culture. Additionally, by comparing model outputs to human responses, we observe that LLMs often replicate human bias patterns, highlighting the complex interplay between learned representations and cultural stereotypes. Our PBBQ dataset is also publicly available for use in future work. Content warning: This paper contains unsafe content.

Anthology ID:: 2026.lrec-main.313
Volume:: Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:: May
Year:: 2026
Address:: Palma de Mallorca, Spain
Editors:: Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:: LREC
SIG:
Publisher:: ELRA Language Resource Association
Note:
Pages:: 3944–3960
Language:
URL:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.313/
DOI:
Bibkey:
Cite (ACL):: Farhan Farsi, Shayan Bali, Fatemeh Valeh, Parsa Ghofrani, Alireza Pakniat, Seyedkian Kashfipour, and Amir H. Payberah. 2026. PBBQ: A Persian Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models. International Conference on Language Resources and Evaluation, main:3944–3960.
Cite (Informal):: PBBQ: A Persian Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models (Farsi et al., LREC 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.313.pdf

PDF Cite Search Fix data