Kiwoong Park
2026
Are they lovers or friends? Evaluating LLMs’ Social Reasoning in English and Korean Dialogues
Eunsu Kim | Junyeong Park | Juhyun Oh | Kiwoong Park | Seyoung Song | A. Seza Doğruöz | Alice Oh | Najoung Kim
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Eunsu Kim | Junyeong Park | Juhyun Oh | Kiwoong Park | Seyoung Song | A. Seza Doğruöz | Alice Oh | Najoung Kim
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
As LLMs are increasingly deployed in real-world interactions, their social reasoning in interpersonal communication becomes critical. To explore their capabilities, we introduce SCRIPTS, a 1.1k-dialogue dataset in English and Korean, sourced from movie scripts and propose a social reasoning task based on SCRIPTS that evaluates the capacity of LLMs to infer the social relationships (e.g., friends, lovers) between speakers in each dialogue. Evaluating nine models on our task, current LLMs achieve around 75–80% on the English dataset and 58–69% in Korean, and models predict an Unlikely relationship in 10–25% of responses in both languages.Furthermore, we find that thinking models and chain-of-thought prompting provide minimal benefits for social reasoning and occasionally amplify social biases.In sum, there are significant limitations in current LLMs’ social reasoning capabilities, especially for Korean, highlighting the need for efforts to develop socially-aware LLMs across languages.
2020
Suicidal Risk Detection for Military Personnel
Sungjoon Park | Kiwoong Park | Jaimeen Ahn | Alice Oh
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Sungjoon Park | Kiwoong Park | Jaimeen Ahn | Alice Oh
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
We analyze social media for detecting the suicidal risk of military personnel, which is especially crucial for countries with compulsory military service such as the Republic of Korea. From a widely-used Korean social Q&A site, we collect posts containing military-relevant content written by active-duty military personnel. We then annotate the posts with two groups of experts: military experts and mental health experts. Our dataset includes 2,791 posts with 13,955 corresponding expert annotations of suicidal risk levels, and this dataset is available to researchers who consent to research ethics agreement. Using various fine-tuned state-of-the-art language models, we predict the level of suicide risk, reaching .88 F1 score for classifying the risks.