Word Embeddings as Tuples of Feature Probabilities

Siddharth Bhat, Alok Debnath, Souvik Banerjee, Manish Shrivastava


Abstract
In this paper, we provide an alternate perspective on word representations, by reinterpreting the dimensions of the vector space of a word embedding as a collection of features. In this reinterpretation, every component of the word vector is normalized against all the word vectors in the vocabulary. This idea now allows us to view each vector as an n-tuple (akin to a fuzzy set), where n is the dimensionality of the word representation and each element represents the probability of the word possessing a feature. Indeed, this representation enables the use fuzzy set theoretic operations, such as union, intersection and difference. Unlike previous attempts, we show that this representation of words provides a notion of similarity which is inherently asymmetric and hence closer to human similarity judgements. We compare the performance of this representation with various benchmarks, and explore some of the unique properties including function word detection, detection of polysemous words, and some insight into the interpretability provided by set theoretic operations.
Anthology ID:
2020.repl4nlp-1.4
Volume:
Proceedings of the 5th Workshop on Representation Learning for NLP
Month:
July
Year:
2020
Address:
Online
Editors:
Spandana Gella, Johannes Welbl, Marek Rei, Fabio Petroni, Patrick Lewis, Emma Strubell, Minjoon Seo, Hannaneh Hajishirzi
Venue:
RepL4NLP
SIG:
SIGREP
Publisher:
Association for Computational Linguistics
Note:
Pages:
24–33
Language:
URL:
https://aclanthology.org/2020.repl4nlp-1.4
DOI:
10.18653/v1/2020.repl4nlp-1.4
Bibkey:
Cite (ACL):
Siddharth Bhat, Alok Debnath, Souvik Banerjee, and Manish Shrivastava. 2020. Word Embeddings as Tuples of Feature Probabilities. In Proceedings of the 5th Workshop on Representation Learning for NLP, pages 24–33, Online. Association for Computational Linguistics.
Cite (Informal):
Word Embeddings as Tuples of Feature Probabilities (Bhat et al., RepL4NLP 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl-24-ws-corrections/2020.repl4nlp-1.4.pdf
Video:
 http://slideslive.com/38929770