Junyeong Park
2026
Investigating Counterfactual Unfairness in LLMs towards Identities through Humor
Shubin Kim | Yejin Son | Junyeong Park | Keummin Ka | Seungbeen Lee | Jaeyoung Lee | Hyeju Jang | Alice Oh | Youngjae Yu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Shubin Kim | Yejin Son | Junyeong Park | Keummin Ka | Seungbeen Lee | Jaeyoung Lee | Hyeju Jang | Alice Oh | Youngjae Yu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Humor holds up a mirror to social perception: what we find funny often reflects who we are and how we judge others. When language models engage with humor, their reactions expose the social assumptions they have internalized from training data. In this paper, we investigate counterfactual unfairness through humor by observing how the model’s responses change when we swap who speaks and who is addressed while holding other factors constant. Our framework spans three tasks: humor generation refusal, speaker intention inference, and relational/societal impact prediction, covering both identity-agnostic humor and identity-specific disparagement humor. We introduce interpretable bias metrics that capture asymmetric patterns under identity swaps. Experiments across state-of-the-art models reveal consistent relational disparities: jokes told by privileged speakers are refused up to 67.5% more often, judged as malicious 64.7% more frequently, and rated up to 1.5 points higher in social harm on a 5-point scale. These patterns highlight how sensitivity and stereotyping coexist in generative models, complicating efforts toward fairness and cultural alignment.
Are they lovers or friends? Evaluating LLMs’ Social Reasoning in English and Korean Dialogues
Eunsu Kim | Junyeong Park | Juhyun Oh | Kiwoong Park | Seyoung Song | A. Seza Doğruöz | Alice Oh | Najoung Kim
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Eunsu Kim | Junyeong Park | Juhyun Oh | Kiwoong Park | Seyoung Song | A. Seza Doğruöz | Alice Oh | Najoung Kim
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
As LLMs are increasingly deployed in real-world interactions, their social reasoning in interpersonal communication becomes critical. To explore their capabilities, we introduce SCRIPTS, a 1.1k-dialogue dataset in English and Korean, sourced from movie scripts and propose a social reasoning task based on SCRIPTS that evaluates the capacity of LLMs to infer the social relationships (e.g., friends, lovers) between speakers in each dialogue. Evaluating nine models on our task, current LLMs achieve around 75–80% on the English dataset and 58–69% in Korean, and models predict an Unlikely relationship in 10–25% of responses in both languages.Furthermore, we find that thinking models and chain-of-thought prompting provide minimal benefits for social reasoning and occasionally amplify social biases.In sum, there are significant limitations in current LLMs’ social reasoning capabilities, especially for Korean, highlighting the need for efforts to develop socially-aware LLMs across languages.
2025
Survey of Cultural Awareness in Language Models: Text and Beyond
Siddhesh Pawar | Junyeong Park | Jiho Jin | Arnav Arora | Junho Myung | Srishti Yadav | Faiz Ghifari Haznitrama | Inhwa Song | Alice Oh | Isabelle Augenstein
Computational Linguistics, Volume 51, Issue 3 - September 2025
Siddhesh Pawar | Junyeong Park | Jiho Jin | Arnav Arora | Junho Myung | Srishti Yadav | Faiz Ghifari Haznitrama | Inhwa Song | Alice Oh | Isabelle Augenstein
Computational Linguistics, Volume 51, Issue 3 - September 2025
Large-scale deployment of large language models (LLMs) in various applications, such as chatbots and virtual assistants, requires LLMs to be culturally sensitive to the user to ensure inclusivity. Culture has been widely studied in psychology and anthropology, and there has been a recent surge in research on making LLMs more culturally inclusive, going beyond multilinguality and building on findings from psychology and anthropology. In this article, we survey efforts towards incorporating cultural awareness into text-based and multimodal LLMs. We start by defining cultural awareness in LLMs, taking definitions of culture from the anthropology and psychology literature as a point of departure. We then examine methodologies adopted for creating cross-cultural datasets, strategies for cultural inclusion in downstream tasks, and methodologies that have been used for benchmarking cultural awareness in LLMs. Further, we discuss the ethical implications of cultural alignment, the role of human–computer interaction in driving cultural inclusion in LLMs, and the role of cultural alignment in driving social science research. We finally provide pointers to future research based on our findings about gaps in the literature.1
LLM-C3MOD: A Human-LLM Collaborative System for Cross-Cultural Hate Speech Moderation
Junyeong Park | Seogyeong Jeong | Seyoung Song | Yohan Lee | Alice Oh
Proceedings of the 3rd Workshop on Cross-Cultural Considerations in NLP (C3NLP 2025)
Junyeong Park | Seogyeong Jeong | Seyoung Song | Yohan Lee | Alice Oh
Proceedings of the 3rd Workshop on Cross-Cultural Considerations in NLP (C3NLP 2025)
Content moderation platforms concentrate resources on English content despite serving predominantly non-English speaking users.Also, given the scarcity of native moderators for low-resource languages, non-native moderators must bridge this gap in moderation tasks such as hate speech moderation.Through a user study, we identify that non-native moderators struggle with understanding culturally-specific knowledge, sentiment, and internet culture in the hate speech.To assist non-native moderators, we present LLM-C3MOD, a human-LLM collaborative pipeline with three steps: (1) RAG-enhanced cultural context annotations; (2) initial LLM-based moderation; and (3) targeted human moderation for cases lacking LLM consensus.Evaluated on Korean hate speech dataset with Indonesian and German participants, our system achieves 78% accuracy (surpassing GPT-4o’s 71% baseline) while reducing human workload by 83.6%.In addition, cultural context annotations improved non-native moderator accuracy from 22% to 61%, with humans notably excelling at nuanced tasks where LLMs struggle.Our findings demonstrate that non-native moderators, when properly supported by LLMs, can effectively contribute to cross-cultural hate speech moderation.
WHEN TOM EATS KIMCHI: Evaluating Cultural Awareness of Multimodal Large Language Models in Cultural Mixture Contexts
Jun Seong Kim | Kyaw Ye Thu | Javad Ismayilzada | Junyeong Park | Eunsu Kim | Huzama Ahmad | Na Min An | James Thorne | Alice Oh
Proceedings of the 3rd Workshop on Cross-Cultural Considerations in NLP (C3NLP 2025)
Jun Seong Kim | Kyaw Ye Thu | Javad Ismayilzada | Junyeong Park | Eunsu Kim | Huzama Ahmad | Na Min An | James Thorne | Alice Oh
Proceedings of the 3rd Workshop on Cross-Cultural Considerations in NLP (C3NLP 2025)
In a highly globalized world, it is important for multi-modal large language models (MLLMs) to recognize and respond correctly to mixed-cultural inputs.For example, a model should correctly identify kimchi (Korean food) in an image both when an Asian woman is eating it, as well as an African man is eating it.However, current MLLMs show an over-reliance on the visual features of the person, leading to misclassification of the entities. To examine the robustness of MLLMs to different ethnicity, we introduce MIXCUBE, a cross-cultural bias benchmark, and study elements from five countries and four ethnicities. Our findings reveal that MLLMs achieve both higher accuracy and lower sensitivity to such perturbation for high-resource cultures, but not for low-resource cultures. GPT-4o, the best-performing model overall, shows up to 58% difference in accuracy between the original and perturbed cultural settings in low-resource cultures
Diffusion Models Through a Global Lens: Are They Culturally Inclusive?
Zahra Bayramli | Ayhan Suleymanzade | Na Min An | Huzama Ahmad | Eunsu Kim | Junyeong Park | James Thorne | Alice Oh
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zahra Bayramli | Ayhan Suleymanzade | Na Min An | Huzama Ahmad | Eunsu Kim | Junyeong Park | James Thorne | Alice Oh
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Text-to-image diffusion models have recently enabled the creation of visually compelling, detailed images from textual prompts. However, their ability to accurately represent various cultural nuances remains an open question. In our work, we introduce CULTDIFF benchmark, evaluating whether state-of-the-art diffusion models can generate culturally specific images spanning ten countries. We show that these models often fail to generate cultural artifacts in architecture, clothing, and food, especially for underrepresented country regions, by conducting a fine-grained analysis of different similarity aspects, revealing significant disparities in cultural relevance, description fidelity, and realism compared to real-world reference images. With the collected human evaluations, we develop a neural-based image-image similarity metric, namely, CULTDIFF-S, to predict human judgment on real and generated images with cultural artifacts. Our work highlights the need for more inclusive generative AI systems and equitable dataset representation over a wide range of cultures.
Search
Fix author
Co-authors
- Alice Oh 6
- Eunsu Kim 3
- Huzama Ahmad 2
- Na Min An 2
- Seyoung Song 2
- James Thorne 2
- Arnav Arora 1
- Isabelle Augenstein 1
- Zahra Bayramli 1
- A. Seza Doğruöz 1
- Faiz Ghifari Haznitrama 1
- Javad Ismayilzada 1
- Hyeju Jang 1
- Seogyeong Jeong 1
- Jiho Jin 1
- Keummin Ka 1
- Jun-Seong Kim 1
- Najoung Kim 1
- Shubin Kim 1
- Jaeyoung Lee 1
- Seungbeen Lee 1
- Yohan Lee 1
- Junho Myung 1
- Juhyun Oh 1
- Kiwoong Park 1
- Siddhesh Pawar 1
- Yejin Son 1
- Inhwa Song 1
- Ayhan Suleymanzade 1
- Kyaw Ye Thu 1
- Srishti Yadav 1
- Youngjae Yu 1