Anna Gladkova


2016

pdf bib
Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t.
Anna Gladkova | Aleksandr Drozd | Satoshi Matsuoka
Proceedings of the NAACL Student Research Workshop

pdf
Intrinsic Evaluations of Word Embeddings: What Can We Do Better?
Anna Gladkova | Aleksandr Drozd
Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP

pdf
The CogALex-V Shared Task on the Corpus-Based Identification of Semantic Relations
Enrico Santus | Anna Gladkova | Stefan Evert | Alessandro Lenci
Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex - V)

The shared task of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex-V) aims at providing a common benchmark for testing current corpus-based methods for the identification of lexical semantic relations (synonymy, antonymy, hypernymy, part-whole meronymy) and at gaining a better understanding of their respective strengths and weaknesses. The shared task uses a challenging dataset extracted from EVALution 1.0, which contains word pairs holding the above-mentioned relations as well as semantically unrelated control items (random). The task is split into two subtasks: (i) identification of related word pairs vs. unrelated ones; (ii) classification of the word pairs according to their semantic relation. This paper describes the subtasks, the dataset, the evaluation metrics, the seven participating systems and their results. The best performing system in subtask 1 is GHHH (F1 = 0.790), while the best system in subtask 2 is LexNet (F1 = 0.445). The dataset and the task description are available at https://sites.google.com/site/cogalex2016/home/shared-task.

pdf
Word Embeddings, Analogies, and Machine Learning: Beyond king - man + woman = queen
Aleksandr Drozd | Anna Gladkova | Satoshi Matsuoka
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Solving word analogies became one of the most popular benchmarks for word embeddings on the assumption that linear relations between word pairs (such as king:man :: woman:queen) are indicative of the quality of the embedding. We question this assumption by showing that the information not detected by linear offset may still be recoverable by a more sophisticated search method, and thus is actually encoded in the embedding. The general problem with linear offset is its sensitivity to the idiosyncrasies of individual words. We show that simple averaging over multiple word pairs improves over the state-of-the-art. A further improvement in accuracy (up to 30% for some embeddings and relations) is achieved by combining cosine similarity with an estimation of the extent to which a candidate answer belongs to the correct word class. In addition to this practical contribution, this work highlights the problem of the interaction between word embeddings and analogy retrieval algorithms, and its implications for the evaluation of word embeddings and the use of analogies in extrinsic tasks.