AlignCultura: Towards Culturally Aligned Large Language Models?

Gautam Siddharth Kashyap; Mark Dras; Usman Naseem

AlignCultura: Towards Culturally Aligned Large Language Models?

Gautam Siddharth Kashyap, Mark Dras, Usman Naseem

Abstract

Cultural alignment in Large Language Models (LLMs) is essential for producing contextually aware, respectful, and trustworthy outputs. Without it, models risk generating stereotyped, insensitive, or misleading responses that fail to reflect cultural diversity w.r.t Helpful, Harmless, and Honest (HHH) paradigm. Existing benchmarks represent early steps toward cultural alignment; yet, no benchmarks currently enables systematic evaluation of cultural alignment in line with UNESCO’s principles of cultural diversity w.r.t HHH paradigm. Therefore, to address this gap, we built Align-Cultura, two-stage pipeline for cultural alignment. Stage I constructs CULTURAX, the HHH-English dataset grounded in the UNESCO cultural taxonomy, through Query Construction, which reclassifies prompts, expands underrepresented domains (or labels), and prevents data leakage with SimHash. Then, Response Generation pairs prompts with culturally grounded responses via two-stage rejection sampling. The final dataset contains 1,500 samples spanning 30 subdomains of tangible and intangible cultural forms. Stage II benchmarks CULTURAX on general-purpose models, culturally fine-tuned models, and open-weight LLMs (Qwen3-8B and DeepSeek-R1-Distill-Qwen-7B). Empirically, culturally fine-tuned models improve joint HHH by 4%–6%, reduce cultural failures by 18%, achieve 10%–12% efficiency gains, and limit leakage to 0.3%.

Anthology ID:: 2026.acl-long.1762
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 37986–37996
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1762/
DOI:
Bibkey:
Cite (ACL):: Gautam Siddharth Kashyap, Mark Dras, and Usman Naseem. 2026. AlignCultura: Towards Culturally Aligned Large Language Models?. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 37986–37996, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: AlignCultura: Towards Culturally Aligned Large Language Models? (Kashyap et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1762.pdf
Checklist:: 2026.acl-long.1762.checklist.pdf

PDF Cite Search Checklist Fix data