Improved CCG Parsing with Semi-supervised Supertagging

Mike Lewis, Mark Steedman

[How to correct problems with metadata yourself]


Abstract
Current supervised parsers are limited by the size of their labelled training data, making improving them with unlabelled data an important goal. We show how a state-of-the-art CCG parser can be enhanced, by predicting lexical categories using unsupervised vector-space embeddings of words. The use of word embeddings enables our model to better generalize from the labelled data, and allows us to accurately assign lexical categories without depending on a POS-tagger. Our approach leads to substantial improvements in dependency parsing results over the standard supervised CCG parser when evaluated on Wall Street Journal (0.8%), Wikipedia (1.8%) and biomedical (3.4%) text. We compare the performance of two recently proposed approaches for classification using a wide variety of word embeddings. We also give a detailed error analysis demonstrating where using embeddings outperforms traditional feature sets, and showing how including POS features can decrease accuracy.
Anthology ID:
Q14-1026
Volume:
Transactions of the Association for Computational Linguistics, Volume 2
Month:
Year:
2014
Address:
Cambridge, MA
Editors:
Dekang Lin, Michael Collins, Lillian Lee
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
327–338
Language:
URL:
https://aclanthology.org/Q14-1026
DOI:
10.1162/tacl_a_00186
Bibkey:
Cite (ACL):
Mike Lewis and Mark Steedman. 2014. Improved CCG Parsing with Semi-supervised Supertagging. Transactions of the Association for Computational Linguistics, 2:327–338.
Cite (Informal):
Improved CCG Parsing with Semi-supervised Supertagging (Lewis & Steedman, TACL 2014)
Copy Citation:
PDF:
https://preview.aclanthology.org/teach-a-man-to-fish/Q14-1026.pdf
Data
Penn Treebank