Evaluating Cultural and Social Awareness of LLM Web Agents

Haoyi Qiu, Alexander Fabbri, Divyansh Agarwal, Kung-Hsiang Huang, Sarah Tan, Nanyun Peng, Chien-Sheng Wu


Abstract
As large language models (LLMs) expand into performing as agents for real-world applications beyond traditional NLP tasks, evaluating their robustness becomes increasingly important. However, existing benchmarks often overlook critical dimensions like cultural and social awareness. To address these, we introduce CASA, a benchmark designed to assess LLM agents’ sensitivity to cultural and social norms across two web-based tasks: online shopping and social discussion forums. Our approach evaluates LLM agents’ ability to detect and appropriately respond to norm-violating user queries and observations. Furthermore, we propose a comprehensive evaluation framework that measures awareness coverage, helpfulness in managing user queries, and the violation rate when facing misleading web content. Experiments show that current LLMs perform significantly better in non-agent than in web-based agent environments, with agents achieving less than 10% awareness coverage and over 40% violation rates. To improve performance, we explore two methods: prompting and fine-tuning, and find that combining both methods can offer complementary advantages – fine-tuning on culture-specific datasets significantly enhances the agents’ ability to generalize across different regions, while prompting boosts the agents’ ability to navigate complex tasks. These findings highlight the importance of constantly benchmarking LLM agents’ cultural and social awareness during the development cycle.
Anthology ID:
2025.findings-naacl.222
Volume:
Findings of the Association for Computational Linguistics: NAACL 2025
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3978–4005
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.222/
DOI:
Bibkey:
Cite (ACL):
Haoyi Qiu, Alexander Fabbri, Divyansh Agarwal, Kung-Hsiang Huang, Sarah Tan, Nanyun Peng, and Chien-Sheng Wu. 2025. Evaluating Cultural and Social Awareness of LLM Web Agents. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 3978–4005, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Evaluating Cultural and Social Awareness of LLM Web Agents (Qiu et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.222.pdf