Nikola Tulechki

2021

pdf bib abs
Application of Deep Learning Methods to SNOMED CT Encoding of Clinical Texts: From Data Collection to Extreme Multi-Label Text-Based Classification
Anton Hristov | Aleksandar Tahchiev | Hristo Papazov | Nikola Tulechki | Todor Primov | Svetla Boytcheva
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Concept normalization of clinical texts to standard medical classifications and ontologies is a task with high importance for healthcare and medical research. We attempt to solve this problem through automatic SNOMED CT encoding, where SNOMED CT is one of the most widely used and comprehensive clinical term ontologies. Applying basic Deep Learning models, however, leads to undesirable results due to the unbalanced nature of the data and the extreme number of classes. We propose a classification procedure that features a multiple-step workflow consisting of label clustering, multi-cluster classification, and clusters-to-labels mapping. For multi-cluster classification, BioBERT is fine-tuned over our custom dataset. The clusters-to-labels mapping is carried out by a one-vs-all classifier (SVC) applied to every single cluster. We also present the steps for automatic dataset generation of textual descriptions annotated with SNOMED CT codes based on public data and linked open data. In order to cope with the problem that our dataset is highly unbalanced, some data augmentation methods are applied. The results from the conducted experiments show high accuracy and reliability of our approach for prediction of SNOMED CT codes relevant to a clinical text.

2019

pdf bib abs
Comparison of Machine Learning Approaches for Industry Classification Based on Textual Descriptions of Companies
Andrey Tagarev | Nikola Tulechki | Svetla Boytcheva
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

This paper addresses the task of categorizing companies within industry classification schemes. The datasets consists of encyclopedic articles about companies and their economic activities. The target classification schema is build by mapping linked open data in a semi-supervised manner. Target classes are build bottom-up from DBpedia. We apply several state of the art text classification techniques, based both on deep-learning and classical vector-space models.

Cet article présente des applications d’outils et méthodes du traitement automatique des langues (TAL) à la maîtrise du risque industriel grâce à l’analyse de données textuelles issues de volumineuses bases de retour d’expérience (REX). Il explicite d’abord le domaine de la gestion de la sûreté, ses aspects politiques et sociaux ainsi que l’activité des experts en sûreté et les besoins qu’ils expriment. Dans un deuxième temps il présente une série de techniques, comme la classification automatique de documents, le repérage de subjectivité, et le clustering, adaptées aux données REX visant à répondre à ces besoins présents et à venir, sous forme d’outils, en support à l’activité des experts.

Co-authors

Andrey Tagarev 1

Aleksandar Tahchiev 1

Venues

Fix data

Nikola Tulechki

Fixing paper assignments

2021

2019

2013

2012

2011

Co-authors

Venues