A Comparative Study of Pre-trained Encoders for Low-Resource Named Entity Recognition
Yuxuan Chen, Jonas Mikkelsen, Arne Binder, Christoph Alt, Leonhard Hennig
Abstract
Pre-trained language models (PLM) are effective components of few-shot named entity recognition (NER) approaches when augmented with continued pre-training on task-specific out-of-domain data or fine-tuning on in-domain data. However, their performance in low-resource scenarios, where such data is not available, remains an open question. We introduce an encoder evaluation framework, and use it to systematically compare the performance of state-of-the-art pre-trained representations on the task of low-resource NER. We analyze a wide range of encoders pre-trained with different strategies, model architectures, intermediate-task fine-tuning, and contrastive learning. Our experimental results across ten benchmark NER datasets in English and German show that encoder performance varies significantly, suggesting that the choice of encoder for a specific low-resource scenario needs to be carefully evaluated.- Anthology ID:
- 2022.repl4nlp-1.6
- Volume:
- Proceedings of the 7th Workshop on Representation Learning for NLP
- Month:
- May
- Year:
- 2022
- Address:
- Dublin, Ireland
- Venue:
- RepL4NLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 46–59
- Language:
- URL:
- https://aclanthology.org/2022.repl4nlp-1.6
- DOI:
- 10.18653/v1/2022.repl4nlp-1.6
- Cite (ACL):
- Yuxuan Chen, Jonas Mikkelsen, Arne Binder, Christoph Alt, and Leonhard Hennig. 2022. A Comparative Study of Pre-trained Encoders for Low-Resource Named Entity Recognition. In Proceedings of the 7th Workshop on Representation Learning for NLP, pages 46–59, Dublin, Ireland. Association for Computational Linguistics.
- Cite (Informal):
- A Comparative Study of Pre-trained Encoders for Low-Resource Named Entity Recognition (Chen et al., RepL4NLP 2022)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2022.repl4nlp-1.6.pdf
- Code
- dfki-nlp/fewie
- Data
- CoNLL-2003, Few-NERD, WNUT 2017