Nunchi-Bench: Benchmarking Language Models on Cultural Reasoning with a Focus on Korean Superstition

Kyuhee Kim; Sangah Lee

Nunchi-Bench: Benchmarking Language Models on Cultural Reasoning with a Focus on Korean Superstition

Abstract

As large language models (LLMs) become key advisors in various domains, their cultural sensitivity and reasoning skills are crucial in multicultural environments. We introduce Nunchi-Bench, a benchmark designed to evaluate LLMs’ cultural understanding, with a focus on Korean superstitions. The benchmark consists of 247 questions spanning 31 topics, assessing factual knowledge, culturally appropriate advice, and situational interpretation. We evaluate multilingual LLMs in both Korean and English to analyze their ability to reason about Korean cultural contexts and how language variations affect performance. To systematically assess cultural reasoning, we propose a novel verification strategy with customized scoring metrics that capture the extent to which models recognize cultural nuances and respond appropriately. Our findings highlight significant challenges in LLMs’ cultural reasoning. While models generally recognize factual information, they struggle to apply it in practical scenarios. Furthermore, explicit cultural framing enhances performance more effectively than relying solely on the language of the prompt. To support further research, we publicly release Nunchi-Bench alongside a leaderboard.

Anthology ID:: 2025.findings-acl.794
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15328–15342
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.findings-acl.794/
DOI:
Bibkey:
Cite (ACL):: Kyuhee Kim and Sangah Lee. 2025. Nunchi-Bench: Benchmarking Language Models on Cultural Reasoning with a Focus on Korean Superstition. In Findings of the Association for Computational Linguistics: ACL 2025, pages 15328–15342, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Nunchi-Bench: Benchmarking Language Models on Cultural Reasoning with a Focus on Korean Superstition (Kim & Lee, Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.findings-acl.794.pdf

PDF Cite Search Fix data