Pushing the Limits of Low-Resource NER Using LLM Artificial Data Generation
Joan Santoso, Patrick Sutanto, Billy Cahyadi, Esther Setiawan
Abstract
Named Entity Recognition (NER) is an important task, but to achieve great performance, it is usually necessary to collect a large amount of labeled data, incurring high costs. In this paper, we propose using open-source Large Language Models (LLM) to generate NER data with only a few labeled examples, reducing the cost of human annotations. Our proposed method is very simple and can perform well using only a few labeled data points. Experimental results on diverse low-resource NER datasets show that our proposed data generation method can significantly improve the baseline. Additionally, our method can be used to augment datasets with class-imbalance problems and consistently improves model performance on macro-F1 metrics.- Anthology ID:
- 2024.findings-acl.575
- Volume:
- Findings of the Association for Computational Linguistics ACL 2024
- Month:
- August
- Year:
- 2024
- Address:
- Bangkok, Thailand and virtual meeting
- Editors:
- Lun-Wei Ku, Andre Martins, Vivek Srikumar
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 9652–9667
- Language:
- URL:
- https://aclanthology.org/2024.findings-acl.575
- DOI:
- Cite (ACL):
- Joan Santoso, Patrick Sutanto, Billy Cahyadi, and Esther Setiawan. 2024. Pushing the Limits of Low-Resource NER Using LLM Artificial Data Generation. In Findings of the Association for Computational Linguistics ACL 2024, pages 9652–9667, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
- Cite (Informal):
- Pushing the Limits of Low-Resource NER Using LLM Artificial Data Generation (Santoso et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.findings-acl.575.pdf