Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling

Gábor Berend

doi:10.1162/tacl_a_00059

Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling

Abstract

In this paper we propose and carefully evaluate a sequence labeling framework which solely utilizes sparse indicator features derived from dense distributed word representations. The proposed model obtains (near) state-of-the art performance for both part-of-speech tagging and named entity recognition for a variety of languages. Our model relies only on a few thousand sparse coding-derived features, without applying any modification of the word representations employed for the different tasks. The proposed model has favorable generalization properties as it retains over 89.8% of its average POS tagging accuracy when trained at 1.2% of the total available training data, i.e. 150 sentences per language.

Anthology ID:: Q17-1018
Volume:: Transactions of the Association for Computational Linguistics, Volume 5
Month:
Year:: 2017
Address:: Cambridge, MA
Editors:: Lillian Lee, Mark Johnson, Kristina Toutanova
Venue:: TACL
SIG:
Publisher:: MIT Press
Note:
Pages:: 247–261
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/Q17-1018/
DOI:: 10.1162/tacl_a_00059
Bibkey:
Cite (ACL):: Gábor Berend. 2017. Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling. Transactions of the Association for Computational Linguistics, 5:247–261.
Cite (Informal):: Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling (Berend, TACL 2017)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/Q17-1018.pdf
Presentation:: Q17-1018.Presentation.pdf
Video:: https://preview.aclanthology.org/ingest-emnlp/Q17-1018.mp4

PDF Cite Search Presentation Video Fix data