Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut

Nathan Schneider; Emily Danchik; Chris Dyer; Noah A. Smith

doi:10.1162/tacl_a_00176

Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut

Nathan Schneider, Emily Danchik, Chris Dyer, Noah A. Smith

Abstract

We present a novel representation, evaluation measure, and supervised models for the task of identifying the multiword expressions (MWEs) in a sentence, resulting in a lexical semantic segmentation. Our approach generalizes a standard chunking representation to encode MWEs containing gaps, thereby enabling efficient sequence tagging algorithms for feature-rich discriminative models. Experiments on a new dataset of English web text offer the first linguistically-driven evaluation of MWE identification with truly heterogeneous expression types. Our statistical sequence model greatly outperforms a lookup-based segmentation procedure, achieving nearly 60% F1 for MWE identification.

Anthology ID:: Q14-1016
Volume:: Transactions of the Association for Computational Linguistics, Volume 2
Month:
Year:: 2014
Address:: Cambridge, MA
Editors:: Dekang Lin, Michael Collins, Lillian Lee
Venue:: TACL
SIG:
Publisher:: MIT Press
Note:
Pages:: 193–206
Language:
URL:: https://preview.aclanthology.org/nschneid-patch-1/Q14-1016/
DOI:: 10.1162/tacl_a_00176
Bibkey:
Cite (ACL):: Nathan Schneider, Emily Danchik, Chris Dyer, and Noah A. Smith. 2014. Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut. Transactions of the Association for Computational Linguistics, 2:193–206.
Cite (Informal):: Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut (Schneider et al., TACL 2014)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-1/Q14-1016.pdf

PDF Cite Search Fix data