Simple Questions Generate Named Entity Recognition Datasets
Hyunjae Kim, Jaehyo Yoo, Seunghyun Yoon, Jinhyuk Lee, Jaewoo Kang
Abstract
Recent named entity recognition (NER) models often rely on human-annotated datasets requiring the vast engagement of professional knowledge on the target domain and entities. This work introduces an ask-to-generate approach, which automatically generates NER datasets by asking simple natural language questions to an open-domain question answering system (e.g., “Which disease?”). Despite using fewer training resources, our models solely trained on the generated datasets largely outperform strong low-resource models by 19.5 F1 score across six popular NER benchmarks. Our models also show competitive performance with rich-resource models that additionally leverage in-domain dictionaries provided by domain experts. In few-shot NER, we outperform the previous best model by 5.2 F1 score on three benchmarks and achieve new state-of-the-art performance.- Anthology ID:
- 2022.emnlp-main.417
- Volume:
- Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 6220–6236
- Language:
- URL:
- https://aclanthology.org/2022.emnlp-main.417
- DOI:
- Cite (ACL):
- Hyunjae Kim, Jaehyo Yoo, Seunghyun Yoon, Jinhyuk Lee, and Jaewoo Kang. 2022. Simple Questions Generate Named Entity Recognition Datasets. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6220–6236, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Cite (Informal):
- Simple Questions Generate Named Entity Recognition Datasets (Kim et al., EMNLP 2022)
- PDF:
- https://preview.aclanthology.org/nodalida-main-page/2022.emnlp-main.417.pdf