Abstract
In this paper we propose and carefully evaluate a sequence labeling framework which solely utilizes sparse indicator features derived from dense distributed word representations. The proposed model obtains (near) state-of-the art performance for both part-of-speech tagging and named entity recognition for a variety of languages. Our model relies only on a few thousand sparse coding-derived features, without applying any modification of the word representations employed for the different tasks. The proposed model has favorable generalization properties as it retains over 89.8% of its average POS tagging accuracy when trained at 1.2% of the total available training data, i.e. 150 sentences per language.- Anthology ID:
 - Q17-1018
 - Volume:
 - Transactions of the Association for Computational Linguistics, Volume 5
 - Month:
 - Year:
 - 2017
 - Address:
 - Cambridge, MA
 - Editors:
 - Lillian Lee, Mark Johnson, Kristina Toutanova
 - Venue:
 - TACL
 - SIG:
 - Publisher:
 - MIT Press
 - Note:
 - Pages:
 - 247–261
 - Language:
 - URL:
 - https://aclanthology.org/Q17-1018
 - DOI:
 - 10.1162/tacl_a_00059
 - Cite (ACL):
 - Gábor Berend. 2017. Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling. Transactions of the Association for Computational Linguistics, 5:247–261.
 - Cite (Informal):
 - Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling (Berend, TACL 2017)
 - PDF:
 - https://preview.aclanthology.org/ingest-acl-2023-videos/Q17-1018.pdf