PakBBQ: A Culturally Adapted Bias Benchmark for QA

Abdullah Hashmat, Muhammad Arham Mirza, Agha Ali Raza


Abstract
With the widespread adoption of Large Language Models (LLMs) across various applications, it is imperative to ensure their fairness across all user communities. However, most LLMs are trained and evaluated on Western centric data, with little attention paid to low-resource languages and regional contexts. To address this gap, we introduce PakBBQ, a culturally and regionally adapted extension of the original Bias Benchmark for Question Answering (BBQ) dataset. PakBBQ comprises over 214 templates, 17180 QA pairs across 8 categories in both English and Urdu, covering eight bias dimensions including age, disability, appearance, gender, socio-economic status, religious, regional affiliation, and language formality that are relevant in Pakistan. We evaluate multiple multilingual LLMs under both ambiguous and explicitly disambiguated contexts, as well as negative versus non negative question framings. Our experiments reveal (i) an average accuracy gain of 12% with disambiguation, (ii) consistently stronger counter bias behaviors in Urdu than in English, and (iii) marked framing effects that reduce stereotypical responses when questions are posed negatively. These findings highlight the importance of contextualized benchmarks and simple prompt engineering strategies for bias mitigation in low resource settings.
Anthology ID:
2025.emnlp-main.818
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
16171–16183
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.818/
DOI:
Bibkey:
Cite (ACL):
Abdullah Hashmat, Muhammad Arham Mirza, and Agha Ali Raza. 2025. PakBBQ: A Culturally Adapted Bias Benchmark for QA. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 16171–16183, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
PakBBQ: A Culturally Adapted Bias Benchmark for QA (Hashmat et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.818.pdf
Checklist:
 2025.emnlp-main.818.checklist.pdf