Token Vector Spaces

This directory contains the token vector spaces for 181 synsets (concept profiles) visualized with GoogleVis. Per synset, the plots map the semantic distances between the tokens of the types belonging to the synset. This allows to explore to what extent two near-synonyms indeed overlap in meaning and contextual usage.
Token Vector Spaces are a corpus-based, computational linguistic method to capture the meaning of individual occurrences of words (token). They belong to the general class of distributional models of lexical semantics (see Turney & Pantel for an overview). Different implementations exist, but the models below make use of so-called Second Order Co-occurrences, a technique first developed by Hinrich Schütze (1998) for the task of Word Sense Discrimination.The novelty is that we reduce the high-dimensional vector representation of each token to a 2 dimensional visualization through Multidimensional Scaling (MDS), using the isoMDS function from the MASS library in R. The interactive plots have been created using the R-package googleVis (Gesmann & De Castillo 2011) which offers an interface between the statistical software R and the Google Visualisation API.