Towards Realistic Single-Task Continuous Learning Research for NER
Justin Payan, Yuval Merhav, He Xie, Satyapriya Krishna, Anil Ramakrishna, Mukund Sridhar, Rahul Gupta
Abstract
There is an increasing interest in continuous learning (CL), as data privacy is becoming a priority for real-world machine learning applications. Meanwhile, there is still a lack of academic NLP benchmarks that are applicable for realistic CL settings, which is a major challenge for the advancement of the field. In this paper we discuss some of the unrealistic data characteristics of public datasets, study the challenges of realistic single-task continuous learning as well as the effectiveness of data rehearsal as a way to mitigate accuracy loss. We construct a CL NER dataset from an existing publicly available dataset and release it along with the code to the research community.- Anthology ID:
- 2021.findings-emnlp.319
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2021
- Month:
- November
- Year:
- 2021
- Address:
- Punta Cana, Dominican Republic
- Venue:
- Findings
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3773–3783
- Language:
- URL:
- https://aclanthology.org/2021.findings-emnlp.319
- DOI:
- 10.18653/v1/2021.findings-emnlp.319
- Cite (ACL):
- Justin Payan, Yuval Merhav, He Xie, Satyapriya Krishna, Anil Ramakrishna, Mukund Sridhar, and Rahul Gupta. 2021. Towards Realistic Single-Task Continuous Learning Research for NER. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3773–3783, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- Towards Realistic Single-Task Continuous Learning Research for NER (Payan et al., Findings 2021)
- PDF:
- https://preview.aclanthology.org/starsem-semeval-split/2021.findings-emnlp.319.pdf
- Code
- justinpayan/stackoverflowner-ns