An analysis of ambiguity in word sense annotations

David Jurgens


Abstract
Word sense annotation is a challenging task where annotators distinguish which meaning of a word is present in a given context. In some contexts, a word usage may elicit multiple interpretations, resulting either in annotators disagreeing or in allowing the usage to be annotated with multiple senses. While some works have allowed the latter, the extent to which multiple sense annotations are needed has not been assessed. The present work analyzes a dataset of instances annotated with multiple WordNet senses to assess the causes of the multiple interpretations and their relative frequencies, along with the effect of the multiple senses on the contextual interpretation. We show that contextual underspecification is the primary cause of multiple interpretations but that syllepsis still accounts for more than a third of the cases. In addition, we show that sense coarsening can only partially remove the need for labeling instances with multiple senses and we provide suggestions for how future sense annotation guidelines might be developed to account for this need.
Anthology ID:
L14-1692
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3006–3012
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/904_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
David Jurgens. 2014. An analysis of ambiguity in word sense annotations. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 3006–3012, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
An analysis of ambiguity in word sense annotations (Jurgens, LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/904_Paper.pdf