Abstract
With the expansion of pre-trained language model usage in recent years, the importance of datasets for performing tasks in specialized domains has significantly increased. Therefore, we have built a Korean dataset called ESG-Kor to automatically extract Environmental, Social, and Governance (ESG) information, which has recently gained importance. ESG-Kor is a dataset consisting of a total of 118,946 sentences that extracted information on each ESG component from Korean companies’ sustainability reports and manually labeled it according to objective rules provided by ESG evaluation agencies. To verify the effectiveness and applicability of the ESG-Kor dataset, classification performance was confirmed using several Korean pre-trained language models, and significant performance was obtained. Additionally, by extending the ESG classification model to documents of small and medium enterprises and extracting information based on ESG key issues and in-depth analysis, we demonstrated potential and practical use cases in the ESG field.- Anthology ID:
- 2024.findings-emnlp.387
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2024
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 6627–6643
- Language:
- URL:
- https://preview.aclanthology.org/icon-24-ingestion/2024.findings-emnlp.387/
- DOI:
- 10.18653/v1/2024.findings-emnlp.387
- Cite (ACL):
- Jaeyoung Lee, Geonyeong Son, and Misuk Kim. 2024. ESG-Kor: A Korean Dataset for ESG-related Information Extraction and Practical Use Cases. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 6627–6643, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- ESG-Kor: A Korean Dataset for ESG-related Information Extraction and Practical Use Cases (Lee et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/icon-24-ingestion/2024.findings-emnlp.387.pdf