A Scalable Framework for Automated NER Annotation Correction in Low-Resource Languages

Toqeer Ehsan, Thamar Solorio


Abstract
Poor quality or noisy annotations in Named Entity Recognition (NER), as in any other NLP task, make it challenging to achieve state-of-the-art performance. In this paper, we present a multi-step framework to enhance the annotation quality of NER datasets by employing automated techniques. We propose a frequency-based iterative approach that leverages self-training and a dual-threshold mechanism to enhance inference confidence. Experimental evaluations on different NER datasets demonstrate significant improvements in NER performance with respect to the original datasets. This work further explores the potential of generative Large Language Models (LLMs) to perform NER for low-resource languages.
Anthology ID:
2026.findings-eacl.215
Volume:
Findings of the Association for Computational Linguistics: EACL 2026
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4138–4151
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.215/
DOI:
Bibkey:
Cite (ACL):
Toqeer Ehsan and Thamar Solorio. 2026. A Scalable Framework for Automated NER Annotation Correction in Low-Resource Languages. In Findings of the Association for Computational Linguistics: EACL 2026, pages 4138–4151, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
A Scalable Framework for Automated NER Annotation Correction in Low-Resource Languages (Ehsan & Solorio, Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.215.pdf
Checklist:
 2026.findings-eacl.215.checklist.pdf