SubmissionNumber#=%=#58 FinalPaperTitle#=%=#Overcoming Data Scarcity in Named Entity Recognition: Synthetic Data Generation with Large Language Models ShortPaperTitle#=%=# NumberOfPages#=%=#13 CopyrightSigned#=%=#Dao Tuan An JobTitle#==# Organization#==#The University of Tokyo, Tokyo, Japan Abstract#==#Named Entity Recognition (NER) is crucial for extracting domain-specific entities from text, particularly in biomedical and chemical fields. Developing high-quality NER models in specialized domains is challenging due to the limited availability of annotated data, with manual annotation being a key method of data construction. However, manual annotation is time-consuming and requires domain expertise, making it difficult in specialized domains. Traditional data augmentation (DA) techniques also rely on annotated data to some extent, further limiting their effectiveness. In this paper, we propose a novel approach to synthetic data generation for NER using large language models (LLMs) to generate sentences based solely on a set of example entities. This method simplifies the augmentation process and is effective even with a limited set of entities. We evaluate our approach using BERT-based models on the BC4CHEMD, BC5CDR, and TDMSci datasets, demonstrating that synthetic data significantly improves model performance and robustness, particularly in low-resource settings. This work provides a scalable solution for enhancing NER in specialized domains, overcoming the limitations of manual annotation and traditional augmentation methods. Author{1}{Firstname}#=%=#An Author{1}{Lastname}#=%=#Dao Author{1}{Username}#=%=#daotuanan Author{1}{Email}#=%=#dtan@nii.ac.jp Author{1}{Affiliation}#=%=#The University of Tokyo Author{2}{Firstname}#=%=#Hiroki Author{2}{Lastname}#=%=#Teranishi Author{2}{Username}#=%=#chantera Author{2}{Email}#=%=#teranishihiroki@gmail.com Author{2}{Affiliation}#=%=#RIKEN Center for Advanced Intelligence Project Author{3}{Firstname}#=%=#Yuji Author{3}{Lastname}#=%=#Matsumoto Author{3}{Username}#=%=#matsu Author{3}{Email}#=%=#yuji.matsumoto@riken.jp Author{3}{Affiliation}#=%=#Riken Center for Advanced Intelligence Project Author{4}{Firstname}#=%=#Florian Author{4}{Lastname}#=%=#Boudin Author{4}{Username}#=%=#boudin-f Author{4}{Email}#=%=#florian.boudin@univ-nantes.fr Author{4}{Affiliation}#=%=#Nantes University Author{5}{Firstname}#=%=#Akiko Author{5}{Lastname}#=%=#Aizawa Author{5}{Username}#=%=#aizawa Author{5}{Email}#=%=#aizawa@nii.ac.jp Author{5}{Affiliation}#=%=#National Institute of Informatics ========== èéáğö