Anton Alekseev


2022

pdf
RuCCoN: Clinical Concept Normalization in Russian
Alexandr Nesterov | Galina Zubkova | Zulfat Miftahutdinov | Vladimir Kokh | Elena Tutubalina | Artem Shelmanov | Anton Alekseev | Manvel Avetisian | Andrey Chertok | Sergey Nikolenko
Findings of the Association for Computational Linguistics: ACL 2022

We present RuCCoN, a new dataset for clinical concept normalization in Russian manually annotated by medical professionals. It contains over 16,028 entity mentions manually linked to over 2,409 unique concepts from the Russian language part of the UMLS ontology. We provide train/test splits for different settings (stratified, zero-shot, and CUI-less) and present strong baselines obtained with state-of-the-art models such as SapBERT. At present, Russian medical NLP is lacking in both datasets and trained models, and we view this work as an important step towards filling this gap. Our dataset and annotation guidelines are available at https://github.com/sberbank-ai-lab/RuCCoN.

pdf
Medical Crossing: a Cross-lingual Evaluation of Clinical Entity Linking
Anton Alekseev | Zulfat Miftahutdinov | Elena Tutubalina | Artem Shelmanov | Vladimir Ivanov | Vladimir Kokh | Alexander Nesterov | Manvel Avetisian | Andrei Chertok | Sergey Nikolenko
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Medical data annotation requires highly qualified expertise. Despite the efforts devoted to medical entity linking in different languages, available data is very sparse in terms of both data volume and languages. In this work, we establish benchmarks for cross-lingual medical entity linking using clinical reports, clinical guidelines, and medical research papers. We present a test set filtering procedure designed to analyze the “hard cases” of entity linking approaching zero-shot cross-lingual transfer learning, evaluate state-of-the-art models, and draw several interesting conclusions based on our evaluation results.

2020

pdf
Ad Lingua: Text Classification Improves Symbolism Prediction in Image Advertisements
Andrey Savchenko | Anton Alekseev | Sejeong Kwon | Elena Tutubalina | Evgeny Myasnikov | Sergey Nikolenko
Proceedings of the 28th International Conference on Computational Linguistics

Understanding image advertisements is a challenging task, often requiring non-literal interpretation. We argue that standard image-based predictions are insufficient for symbolism prediction. Following the intuition that texts and images are complementary in advertising, we introduce a multimodal ensemble of a state of the art image-based classifier, a classifier based on an object detection architecture, and a fine-tuned language model applied to texts extracted from ads by OCR. The resulting system establishes a new state of the art in symbolism prediction.

2019


AspeRa: Aspect-Based Rating Prediction Based on User Reviews
Elena Tutubalina | Valentin Malykh | Sergey Nikolenko | Anton Alekseev | Ilya Shenbin
Proceedings of the 2019 Workshop on Widening NLP

We propose a novel Aspect-based Rating Prediction model (AspeRa) that estimates user rating based on review texts for the items. It is based on aspect extraction with neural networks and combines the advantages of deep learning and topic modeling. It is mainly designed for recommendations, but an important secondary goal of AspeRa is to discover coherent aspects of reviews that can be used to explain predictions or for user profiling. We conduct a comprehensive empirical study of AspeRa, showing that it outperforms state-of-the-art models in terms of recommendation quality and produces interpretable aspects. This paper is an abridged version of our work (Nikolenko et al., 2019)