Token Vector Spaces
This directory contains the token vector spaces for 181 synsets (concept profiles) visualized with GoogleVis. Per synset, the plots map the semantic distances between the tokens of the types belonging to the synset. This allows to explore to what extent two near-synonyms indeed overlap in meaning and contextual usage.
Token Vector Spaces are a corpus-based, computational linguistic method to capture the meaning of individual occurrences of words (token).
They belong to the general class of distributional models
of lexical semantics (see Turney & Pantel for an overview).
Different implementations exist, but the models below make use of so-called Second Order Co-occurrences, a technique first
developed by Hinrich Schütze (1998)
for the task of Word Sense Discrimination.The novelty is that we reduce the high-dimensional vector representation of each token to a 2 dimensional visualization through
Multidimensional Scaling (MDS), using the isoMDS function from the
MASS library in R. The interactive plots have been created using the R-package
googleVis
(Gesmann & De Castillo 2011) which offers an interface between the statistical software R and the Google Visualisation API.