Abstract
In this paper we propose and carefully evaluate a sequence labeling framework which solely utilizes sparse indicator features derived from dense distributed word representations. The proposed model obtains (near) state-of-the art performance for both part-of-speech tagging and named entity recognition for a variety of languages. Our model relies only on a few thousand sparse coding-derived features, without applying any modification of the word representations employed for the different tasks. The proposed model has favorable generalization properties as it retains over 89.8% of its average POS tagging accuracy when trained at 1.2% of the total available training data, i.e. 150 sentences per language.- Anthology ID:
- Q17-1018
- Volume:
- Transactions of the Association for Computational Linguistics, Volume 5
- Month:
- Year:
- 2017
- Address:
- Cambridge, MA
- Editors:
- Lillian Lee, Mark Johnson, Kristina Toutanova
- Venue:
- TACL
- SIG:
- Publisher:
- MIT Press
- Note:
- Pages:
- 247–261
- Language:
- URL:
- https://aclanthology.org/Q17-1018
- DOI:
- 10.1162/tacl_a_00059
- Cite (ACL):
- Gábor Berend. 2017. Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling. Transactions of the Association for Computational Linguistics, 5:247–261.
- Cite (Informal):
- Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling (Berend, TACL 2017)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/Q17-1018.pdf