Abstract
We present a novel representation, evaluation measure, and supervised models for the task of identifying the multiword expressions (MWEs) in a sentence, resulting in a lexical semantic segmentation. Our approach generalizes a standard chunking representation to encode MWEs containing gaps, thereby enabling efficient sequence tagging algorithms for feature-rich discriminative models. Experiments on a new dataset of English web text offer the first linguistically-driven evaluation of MWE identification with truly heterogeneous expression types. Our statistical sequence model greatly outperforms a lookup-based segmentation procedure, achieving nearly 60% F1 for MWE identification.- Anthology ID:
- Q14-1016
- Volume:
- Transactions of the Association for Computational Linguistics, Volume 2
- Month:
- Year:
- 2014
- Address:
- Cambridge, MA
- Editors:
- Dekang Lin, Michael Collins, Lillian Lee
- Venue:
- TACL
- SIG:
- Publisher:
- MIT Press
- Note:
- Pages:
- 193–206
- Language:
- URL:
- https://aclanthology.org/Q14-1016
- DOI:
- 10.1162/tacl_a_00176
- Cite (ACL):
- Nathan Schneider, Emily Danchik, Chris Dyer, and Noah A. Smith. 2014. Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut. Transactions of the Association for Computational Linguistics, 2:193–206.
- Cite (Informal):
- Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut (Schneider et al., TACL 2014)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/Q14-1016.pdf