Abstract
We tackle unsupervised part-of-speech (POS) tagging by learning hidden Markov models (HMMs) that are particularly well-suited for the problem. These HMMs, which we call anchor HMMs, assume that each tag is associated with at least one word that can have no other tag, which is a relatively benign condition for POS tagging (e.g., “the” is a word that appears only under the determiner tag). We exploit this assumption and extend the non-negative matrix factorization framework of Arora et al. (2013) to design a consistent estimator for anchor HMMs. In experiments, our algorithm is competitive with strong baselines such as the clustering method of Brown et al. (1992) and the log-linear model of Berg-Kirkpatrick et al. (2010). Furthermore, it produces an interpretable model in which hidden states are automatically lexicalized by words.- Anthology ID:
- Q16-1018
- Volume:
- Transactions of the Association for Computational Linguistics, Volume 4
- Month:
- Year:
- 2016
- Address:
- Cambridge, MA
- Editors:
- Lillian Lee, Mark Johnson, Kristina Toutanova
- Venue:
- TACL
- SIG:
- Publisher:
- MIT Press
- Note:
- Pages:
- 245–257
- Language:
- URL:
- https://aclanthology.org/Q16-1018
- DOI:
- 10.1162/tacl_a_00096
- Cite (ACL):
- Karl Stratos, Michael Collins, and Daniel Hsu. 2016. Unsupervised Part-Of-Speech Tagging with Anchor Hidden Markov Models. Transactions of the Association for Computational Linguistics, 4:245–257.
- Cite (Informal):
- Unsupervised Part-Of-Speech Tagging with Anchor Hidden Markov Models (Stratos et al., TACL 2016)
- PDF:
- https://preview.aclanthology.org/ingest-bitext-workshop/Q16-1018.pdf
- Code
- karlstratos/anchor