Transfer Learning for Czech Historical Named Entity Recognition

Helena Hubková, Pavel Kral


Abstract
Nowadays, named entity recognition (NER) achieved excellent results on the standard corpora. However, big issues are emerging with a need for an application in a specific domain, because it requires a suitable annotated corpus with adapted NE tag-set. This is particularly evident in the historical document processing field. The main goal of this paper consists of proposing and evaluation of several transfer learning methods to increase the score of the Czech historical NER. We study several information sources, and we use two neural nets for NE modeling and recognition. We employ two corpora for evaluation of our transfer learning methods, namely Czech named entity corpus and Czech historical named entity corpus. We show that BERT representation with fine-tuning and only the simple classifier trained on the union of corpora achieves excellent results.
Anthology ID:
2021.ranlp-1.65
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Month:
September
Year:
2021
Address:
Held Online
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
576–582
Language:
URL:
https://aclanthology.org/2021.ranlp-1.65
DOI:
Bibkey:
Cite (ACL):
Helena Hubková and Pavel Kral. 2021. Transfer Learning for Czech Historical Named Entity Recognition. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 576–582, Held Online. INCOMA Ltd..
Cite (Informal):
Transfer Learning for Czech Historical Named Entity Recognition (Hubková & Kral, RANLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2021.ranlp-1.65.pdf