Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging

Houquan Zhou, Yang Li, Zhenghua Li, Min Zhang


Abstract
In recent years, large-scale pre-trained language models (PLMs) have made extraordinary progress in most NLP tasks. But, in the unsupervised POS tagging task, works utilizing PLMs are few and fail to achieve state-of-the-art (SOTA) performance. The recent SOTA performance is yielded by a Guassian HMM variant proposed by He et al. (2018). However, as a generative model, HMM makes very strong independence assumptions, making it very challenging to incorporate contexualized word representations from PLMs. In this work, we for the first time propose a neural conditional random field autoencoder (CRF-AE) model for unsupervised POS tagging. The discriminative encoder of CRF-AE can straightforwardly incorporate ELMo word representations. Moreover, inspired by feature-rich HMM, we reintroduce hand-crafted features into the decoder of CRF-AE. Finally, experiments clearly show that our model outperforms previous state-of-the-art models by a large margin on Penn Treebank and multilingual Universal Dependencies treebank v2.0.
Anthology ID:
2022.findings-acl.259
Volume:
Findings of the Association for Computational Linguistics: ACL 2022
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3276–3290
Language:
URL:
https://aclanthology.org/2022.findings-acl.259
DOI:
10.18653/v1/2022.findings-acl.259
Bibkey:
Cite (ACL):
Houquan Zhou, Yang Li, Zhenghua Li, and Min Zhang. 2022. Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging. In Findings of the Association for Computational Linguistics: ACL 2022, pages 3276–3290, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging (Zhou et al., Findings 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp22-frontmatter/2022.findings-acl.259.pdf
Software:
 2022.findings-acl.259.software.zip
Code
 Jacob-Zhou/FeatureCRFAE
Data
Penn TreebankUniversal Dependencies