APTFiNER: Annotation Preserving Translation for Fine-grained Named Entity Recognition

Prachuryya Kaushik, Adittya Gupta, Ajanta Maurya, Gautam Sharma, V. V. Saradhi, Ashish Anand


Abstract
We present APTFiNER, a novel fine-grained named entity recognition (FgNER) dataset covering six low-resource Indian languages spoken by over 400 million people across various nations. While creating FgNER resources through manual annotation is typically expensive and labor-intensive, distant supervision has emerged as a workable alternative. Yet, such FgNER datasets are often noisy, as each entity mentions are often assigned multiple entity types, which necessitates computationally demanding noise-aware models. Furthermore, resources for both coarse-grained and fine-grained NER tasks remain scarce for low-resource languages. To overcome this scarcity, we utilized the superior reasoning and translation capability of Gemini through the proposed annotation-preserving translation method and created a large-scale FgNER dataset comprising over 411 thousand sentences, 697 thousand entity mentions, and 5.8 million tokens in total. We translated the MultiCoNER2 English FgNER dataset to the target languages: <i>Assamese (as)</i>, <i>Marathi (mr)</i>, <i>Nepali (ne)</i>, <i>Tamil (ta)</i>, <i>Telugu (te)</i>, and a vulnerable language, <i>Bodo (brx)</i>. Through rigorous analyses and human evaluations, the effectiveness of our method and the high quality of the resulting dataset are ascertained with F1 score improvements of 8% in both Tamil and Telugu, and 25% in Marathi over the current state-of-the-art. The dataset, expert detector models, the agentic tool, and the interactive web application are available as open-source resources at: <url>https://hf.co/collections/prachuryyaIITG/aptfiner</url>.
Anthology ID:
2026.lrec-main.608
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
7668–7680
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.608/
DOI:
Bibkey:
Cite (ACL):
Prachuryya Kaushik, Adittya Gupta, Ajanta Maurya, Gautam Sharma, V. V. Saradhi, and Ashish Anand. 2026. APTFiNER: Annotation Preserving Translation for Fine-grained Named Entity Recognition. International Conference on Language Resources and Evaluation, main:7668–7680.
Cite (Informal):
APTFiNER: Annotation Preserving Translation for Fine-grained Named Entity Recognition (Kaushik et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.608.pdf