Abstract
Active learning is able to significantly reduce the annotation cost for data-driven techniques. However, previous active learning approaches for natural language processing mainly depend on the entropy-based uncertainty criterion, and ignore the characteristics of natural language. In this paper, we propose a pre-trained language model based active learning approach for sentence matching. Differing from previous active learning, it can provide linguistic criteria from the pre-trained language model to measure instances and help select more effective instances for annotation. Experiments demonstrate our approach can achieve greater accuracy with fewer labeled training instances.- Anthology ID:
- 2020.coling-main.130
- Volume:
- Proceedings of the 28th International Conference on Computational Linguistics
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona, Spain (Online)
- Editors:
- Donia Scott, Nuria Bel, Chengqing Zong
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 1495–1504
- Language:
- URL:
- https://aclanthology.org/2020.coling-main.130
- DOI:
- 10.18653/v1/2020.coling-main.130
- Cite (ACL):
- Guirong Bai, Shizhu He, Kang Liu, Jun Zhao, and Zaiqing Nie. 2020. Pre-trained Language Model Based Active Learning for Sentence Matching. In Proceedings of the 28th International Conference on Computational Linguistics, pages 1495–1504, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Cite (Informal):
- Pre-trained Language Model Based Active Learning for Sentence Matching (Bai et al., COLING 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2020.coling-main.130.pdf
- Data
- MultiNLI, SNLI