Abstract
Polysemy is the phenomenon where a single word form possesses two or more related senses. It is an extremely ubiquitous part of natural language and analyzing it has sparked rich discussions in the linguistics, psychology and philosophy communities alike. With scarce attention paid to polysemy in computational linguistics, and even scarcer attention toward quantifying polysemy, in this paper, we propose a novel, unsupervised framework to compute and estimate polysemy scores for words in multiple languages. We infuse our proposed quantification with syntactic knowledge in the form of dependency structures. This informs the final polysemy scores of the lexicon motivated by recent linguistic findings that suggest there is an implicit relation between syntax and ambiguity/polysemy. We adopt a graph based approach by computing the discrete Ollivier Ricci curvature on a graph of the contextual nearest neighbors. We test our framework on curated datasets controlling for different sense distributions of words in 3 typologically diverse languages - English, French and Spanish. The effectiveness of our framework is demonstrated by significant correlations of our quantification with expert human annotated language resources like WordNet. We observe a 0.3 point increase in the correlation coefficient as compared to previous quantification studies in English. Our research leverages contextual language models and syntactic structures to empirically support the widely held theoretical linguistic notion that syntax is intricately linked to ambiguity/polysemy.- Anthology ID:
- 2022.emnlp-main.722
- Volume:
- Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 10565–10574
- Language:
- URL:
- https://aclanthology.org/2022.emnlp-main.722
- DOI:
- Cite (ACL):
- Anmol Goel, Charu Sharma, and Ponnurangam Kumaraguru. 2022. An Unsupervised, Geometric and Syntax-aware Quantification of Polysemy. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10565–10574, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Cite (Informal):
- An Unsupervised, Geometric and Syntax-aware Quantification of Polysemy (Goel et al., EMNLP 2022)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2022.emnlp-main.722.pdf