APTFiNER: Annotation Preserving Translation for Fine-grained Named Entity Recognition
Prachuryya Kaushik, Adittya Gupta, Ajanta Maurya, Gautam Sharma, V. V. Saradhi, Ashish Anand
Abstract
We present APTFiNER, a novel fine-grained named entity recognition (FgNER) dataset covering six low-resource Indian languages spoken by over 400 million people across various nations. While creating FgNER resources through manual annotation is typically expensive and labor-intensive, distant supervision has emerged as a workable alternative. Yet, such FgNER datasets are often noisy, as each entity mentions are often assigned multiple entity types, which necessitates computationally demanding noise-aware models. Furthermore, resources for both coarse-grained and fine-grained NER tasks remain scarce for low-resource languages. To overcome this scarcity, we utilized the superior reasoning and translation capability of Gemini through the proposed annotation-preserving translation method and created a large-scale FgNER dataset comprising over 411 thousand sentences, 697 thousand entity mentions, and 5.8 million tokens in total. We translated the MultiCoNER2 English FgNER dataset to the target languages: <i>Assamese (as)</i>, <i>Marathi (mr)</i>, <i>Nepali (ne)</i>, <i>Tamil (ta)</i>, <i>Telugu (te)</i>, and a vulnerable language, <i>Bodo (brx)</i>. Through rigorous analyses and human evaluations, the effectiveness of our method and the high quality of the resulting dataset are ascertained with F1 score improvements of 8% in both Tamil and Telugu, and 25% in Marathi over the current state-of-the-art. The dataset, expert detector models, the agentic tool, and the interactive web application are available as open-source resources at: <url>https://hf.co/collections/prachuryyaIITG/aptfiner</url>.- Anthology ID:
- 2026.lrec-main.608
- Volume:
- Proceedings of the Fifteenth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2026
- Address:
- Palma de Mallorca, Spain
- Editors:
- Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
- Venue:
- LREC
- SIG:
- Publisher:
- ELRA Language Resource Association
- Note:
- Pages:
- 7668–7680
- Language:
- URL:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.608/
- DOI:
- Cite (ACL):
- Prachuryya Kaushik, Adittya Gupta, Ajanta Maurya, Gautam Sharma, V. V. Saradhi, and Ashish Anand. 2026. APTFiNER: Annotation Preserving Translation for Fine-grained Named Entity Recognition. International Conference on Language Resources and Evaluation, main:7668–7680.
- Cite (Informal):
- APTFiNER: Annotation Preserving Translation for Fine-grained Named Entity Recognition (Kaushik et al., LREC 2026)
- PDF:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.608.pdf