Comprehensive Annotation of Multiword Expressions in a Social Web Corpus

Nathan Schneider, Spencer Onuffer, Nora Kazour, Emily Danchik, Michael T. Mordowanec, Henrietta Conrad, Noah A. Smith


Abstract
Multiword expressions (MWEs) are quite frequent in languages such as English, but their diversity, the scarcity of individual MWE types, and contextual ambiguity have presented obstacles to corpus-based studies and NLP systems addressing them as a class. Here we advocate for a comprehensive annotation approach: proceeding sentence by sentence, our annotators manually group tokens into MWEs according to guidelines that cover a broad range of multiword phenomena. Under this scheme, we have fully annotated an English web corpus for multiword expressions, including those containing gaps.
Anthology ID:
L14-1433
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
455–461
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/521_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Nathan Schneider, Spencer Onuffer, Nora Kazour, Emily Danchik, Michael T. Mordowanec, Henrietta Conrad, and Noah A. Smith. 2014. Comprehensive Annotation of Multiword Expressions in a Social Web Corpus. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 455–461, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Comprehensive Annotation of Multiword Expressions in a Social Web Corpus (Schneider et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/521_Paper.pdf