A Comparative Study of Pre-trained Encoders for Low-Resource Named Entity Recognition
Yuxuan Chen, Jonas Mikkelsen, Arne Binder, Christoph Alt, Leonhard Hennig
Abstract
Pre-trained language models (PLM) are effective components of few-shot named entity recognition (NER) approaches when augmented with continued pre-training on task-specific out-of-domain data or fine-tuning on in-domain data. However, their performance in low-resource scenarios, where such data is not available, remains an open question. We introduce an encoder evaluation framework, and use it to systematically compare the performance of state-of-the-art pre-trained representations on the task of low-resource NER. We analyze a wide range of encoders pre-trained with different strategies, model architectures, intermediate-task fine-tuning, and contrastive learning. Our experimental results across ten benchmark NER datasets in English and German show that encoder performance varies significantly, suggesting that the choice of encoder for a specific low-resource scenario needs to be carefully evaluated.- Anthology ID:
- 2022.repl4nlp-1.6
- Volume:
- Proceedings of the 7th Workshop on Representation Learning for NLP
- Month:
- May
- Year:
- 2022
- Address:
- Dublin, Ireland
- Editors:
- Spandana Gella, He He, Bodhisattwa Prasad Majumder, Burcu Can, Eleonora Giunchiglia, Samuel Cahyawijaya, Sewon Min, Maximilian Mozes, Xiang Lorraine Li, Isabelle Augenstein, Anna Rogers, Kyunghyun Cho, Edward Grefenstette, Laura Rimell, Chris Dyer
- Venue:
- RepL4NLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 46–59
- Language:
- URL:
- https://aclanthology.org/2022.repl4nlp-1.6
- DOI:
- 10.18653/v1/2022.repl4nlp-1.6
- Cite (ACL):
- Yuxuan Chen, Jonas Mikkelsen, Arne Binder, Christoph Alt, and Leonhard Hennig. 2022. A Comparative Study of Pre-trained Encoders for Low-Resource Named Entity Recognition. In Proceedings of the 7th Workshop on Representation Learning for NLP, pages 46–59, Dublin, Ireland. Association for Computational Linguistics.
- Cite (Informal):
- A Comparative Study of Pre-trained Encoders for Low-Resource Named Entity Recognition (Chen et al., RepL4NLP 2022)
- PDF:
- https://preview.aclanthology.org/landing_page/2022.repl4nlp-1.6.pdf
- Code
- dfki-nlp/fewie
- Data
- CoNLL 2003, Few-NERD, WNUT 2017