Geonyeong Son


2025

pdf bib
From Curiosity to Clarity : Exploring the Impact of Consecutive Why-Questions
Geonyeong Son | Jaeyoung Lee | Misuk Kim
Findings of the Association for Computational Linguistics: NAACL 2025

Humans attempt to understand the real world by asking the fundamental question ”Why?” when faced with incomprehensible situations in everyday life. Such why-questions provide essential knowledge that can help in understanding these situations. In this study, we conducted an end-to-end process to verify the utility of consecutive why-questions, from constructing a large language model (LLM)-based dataset to performing quantitative evaluation and analysis. Firstly, we created a WHY-Chain dataset, consisting of answers generated by an LLM in response to chain-of-why-questions, including a validity check. We also incorporated objectives that effectively capture the ”consecutive” characteristic of the data. Using the WHY-Chain dataset and two types of self-supervised objectives, we trained the pre-trained model. As a result, the refined model demonstrated improved performance on downstream tasks that require commonsense reasoning. Additionally, we conducted various ablation studies to assess the impact of different factors, confirming the scalability of the proposed approach. Lastly, we confirmed the consistency of the logical information by reasoning chain analysis of the answers generated from consecutive why-questions.

2024

pdf bib
ESG-Kor: A Korean Dataset for ESG-related Information Extraction and Practical Use Cases
Jaeyoung Lee | Geonyeong Son | Misuk Kim
Findings of the Association for Computational Linguistics: EMNLP 2024

With the expansion of pre-trained language model usage in recent years, the importance of datasets for performing tasks in specialized domains has significantly increased. Therefore, we have built a Korean dataset called ESG-Kor to automatically extract Environmental, Social, and Governance (ESG) information, which has recently gained importance. ESG-Kor is a dataset consisting of a total of 118,946 sentences that extracted information on each ESG component from Korean companies’ sustainability reports and manually labeled it according to objective rules provided by ESG evaluation agencies. To verify the effectiveness and applicability of the ESG-Kor dataset, classification performance was confirmed using several Korean pre-trained language models, and significant performance was obtained. Additionally, by extending the ESG classification model to documents of small and medium enterprises and extracting information based on ESG key issues and in-depth analysis, we demonstrated potential and practical use cases in the ESG field.