Abstract
The mathematical metaphor offered by the geometric concept of distance in vector spaces with respect to semantics and meaning has been proven to be useful in many monolingual natural language processing applications. There is also some recent and strong evidence that this paradigm can also be useful in the cross-language setting. In this tutorial, we present and discuss some of the most recent advances on exploiting the vector space model paradigm in specific cross-language natural language processing applications, along with a comprehensive review of the theoretical background behind them.First, the tutorial introduces some fundamental concepts of distributional semantics and vector space models. More specifically, the concepts of distributional hypothesis and term-document matrices are revised, followed by a brief discussion on linear and non-linear dimensionality reduction techniques and their implications to the parallel distributed approach to semantic cognition. Next, some classical examples of using vector space models in monolingual natural language processing applications are presented. Specific examples in the areas of information retrieval, related term identification and semantic compositionality are described.Then, the tutorial focuses its attention on the use of the vector space model paradigm in cross-language applications. To this end, some recent examples are presented and discussed in detail, addressing the specific problems of cross-language information retrieval, cross-language sentence matching, and machine translation. Some of the most recent developments in the area of Neural Machine Translation are also discussed.Finally, the tutorial concludes with a discussion about current and future research problems related to the use of vector space models in cross-language settings. Future avenues for scientific research are described, with major emphasis on the extension from vector and matrix representations to tensors, as well as the problem of encoding word position information into the vector-based representations.