Patterns of Polysemy and Homonymy in Contextualised Language Models

Janosch Haber, Massimo Poesio


Abstract
One of the central aspects of contextualised language models is that they should be able to distinguish the meaning of lexically ambiguous words by their contexts. In this paper we investigate the extent to which the contextualised embeddings of word forms that display multiplicity of sense reflect traditional distinctions of polysemy and homonymy. To this end, we introduce an extended, human-annotated dataset of graded word sense similarity and co-predication acceptability, and evaluate how well the similarity of embeddings predicts similarity in meaning. Both types of human judgements indicate that the similarity of polysemic interpretations falls in a continuum between identity of meaning and homonymy. However, we also observe significant differences within the similarity ratings of polysemes, forming consistent patterns for different types of polysemic sense alternation. Our dataset thus appears to capture a substantial part of the complexity of lexical ambiguity, and can provide a realistic test bed for contextualised embeddings. Among the tested models, BERT Large shows the strongest correlation with the collected word sense similarity ratings, but struggles to consistently replicate the observed similarity patterns. When clustering ambiguous word forms based on their embeddings, the model displays high confidence in discerning homonyms and some types of polysemic alternations, but consistently fails for others.
Anthology ID:
2021.findings-emnlp.226
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
2663–2676
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.226
DOI:
10.18653/v1/2021.findings-emnlp.226
Bibkey:
Cite (ACL):
Janosch Haber and Massimo Poesio. 2021. Patterns of Polysemy and Homonymy in Contextualised Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2663–2676, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Patterns of Polysemy and Homonymy in Contextualised Language Models (Haber & Poesio, Findings 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-2024-clasp/2021.findings-emnlp.226.pdf
Video:
 https://preview.aclanthology.org/ingest-2024-clasp/2021.findings-emnlp.226.mp4