Konstantinos Skianis

2020

pdf abs
Evaluation of Greek Word Embeddings
Stamatis Outsios | Christos Karatsalos | Konstantinos Skianis | Michalis Vazirgiannis
Proceedings of the Twelfth Language Resources and Evaluation Conference

Since word embeddings have been the most popular input for many NLP tasks, evaluating their quality is critical. Most research efforts are focusing on English word embeddings. This paper addresses the problem of training and evaluating such models for the Greek language. We present a new word analogy test set considering the original English Word2vec analogy test set and some specific linguistic aspects of the Greek language as well. Moreover, we create a Greek version of WordSim353 test collection for a basic evaluation of word similarities. Produced resources are available for download. We test seven word vector models and our evaluation shows that we are able to create meaningful representations. Last, we discover that the morphological complexity of the Greek language and polysemy can influence the quality of the resulting word embeddings.

pdf abs
Evaluation of Machine Translation Methods applied to Medical Terminologies
Konstantinos Skianis | Yann Briand | Florent Desgrippes
Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis

Medical terminologies resources and standards play vital roles in clinical data exchanges, enabling significantly the services’ interoperability within healthcare national information networks. Health and medical science are constantly evolving causing requirements to advance the terminologies editions. In this paper, we present our evaluation work of the latest machine translation techniques addressing medical terminologies. Experiments have been conducted leveraging selected statistical and neural machine translation methods. The devised procedure is tested on a validated sample of ICD-11 and ICF terminologies from English to French with promising results.

2018

pdf abs
Fusing Document, Collection and Label Graph-based Representations with Word Embeddings for Text Classification
Konstantinos Skianis | Fragkiskos Malliaros | Michalis Vazirgiannis
Proceedings of the Twelfth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-12)

Contrary to the traditional Bag-of-Words approach, we consider the Graph-of-Words(GoW) model in which each document is represented by a graph that encodes relationships between the different terms. Based on this formulation, the importance of a term is determined by weighting the corresponding node in the document, collection and label graphs, using node centrality criteria. We also introduce novel graph-based weighting schemes by enriching graphs with word-embedding similarities, in order to reward or penalize semantic relationships. Our methods produce more discriminative feature weights for text categorization, outperforming existing frequency-based criteria.

pdf abs
Orthogonal Matching Pursuit for Text Classification
Konstantinos Skianis | Nikolaos Tziortziotis | Michalis Vazirgiannis
Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text

In text classification, the problem of overfitting arises due to the high dimensionality, making regularization essential. Although classic regularizers provide sparsity, they fail to return highly accurate models. On the contrary, state-of-the-art group-lasso regularizers provide better results at the expense of low sparsity. In this paper, we apply a greedy variable selection algorithm, called Orthogonal Matching Pursuit, for the text classification task. We also extend standard group OMP by introducing overlapping Group OMP to handle overlapping groups of features. Empirical analysis verifies that both OMP and overlapping GOMP constitute powerful regularizers, able to produce effective and very sparse models. Code and data are available online.