Jaehyo Yoo


2022

pdf
Simple Questions Generate Named Entity Recognition Datasets
Hyunjae Kim | Jaehyo Yoo | Seunghyun Yoon | Jinhyuk Lee | Jaewoo Kang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Recent named entity recognition (NER) models often rely on human-annotated datasets requiring the vast engagement of professional knowledge on the target domain and entities. This work introduces an ask-to-generate approach, which automatically generates NER datasets by asking simple natural language questions to an open-domain question answering system (e.g., “Which disease?”). Despite using fewer training resources, our models solely trained on the generated datasets largely outperform strong low-resource models by 19.5 F1 score across six popular NER benchmarks. Our models also show competitive performance with rich-resource models that additionally leverage in-domain dictionaries provided by domain experts. In few-shot NER, we outperform the previous best model by 5.2 F1 score on three benchmarks and achieve new state-of-the-art performance.