Constructing a Korean Named Entity Recognition Dataset for the Financial Domain using Active Learning

Dong-Ho Jeong, Min-Kang Heo, Hyung-Chul Kim, Sang-Won Park


Abstract
The performance of deep learning models depends on the quality and quantity of data. Data construction, however, is time- consuming and costly. In addition, when expert domain data are constructed, the availability of experts is limited. In such cases, active learning can efficiently increase the performance of the learning models with minimal data construction. Although various datasets have been constructed using active learning techniques, vigorous studies on the construction of Korean data on expert domains are yet to be conducted. In this study, a corpus for named entity recognition was constructed for the financial domain using the active learning technique. The contributions of the study are as follows. (1) It was verified that the active learning technique could effectively construct the named entity recognition corpus for the financial domain, and (2) a named entity recognizer for the financial domain was developed. Data of 8,043 sentences were constructed using the proposed method, and the performance of the named entity recognizer reached 80.84%. Moreover, the proposed method reduced data construction costs by 12–25%
Anthology ID:
2020.icon-main.27
Volume:
Proceedings of the 17th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2020
Address:
Indian Institute of Technology Patna, Patna, India
Editors:
Pushpak Bhattacharyya, Dipti Misra Sharma, Rajeev Sangal
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
208–212
Language:
URL:
https://aclanthology.org/2020.icon-main.27
DOI:
Bibkey:
Cite (ACL):
Dong-Ho Jeong, Min-Kang Heo, Hyung-Chul Kim, and Sang-Won Park. 2020. Constructing a Korean Named Entity Recognition Dataset for the Financial Domain using Active Learning. In Proceedings of the 17th International Conference on Natural Language Processing (ICON), pages 208–212, Indian Institute of Technology Patna, Patna, India. NLP Association of India (NLPAI).
Cite (Informal):
Constructing a Korean Named Entity Recognition Dataset for the Financial Domain using Active Learning (Jeong et al., ICON 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2020.icon-main.27.pdf