This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
NikolaTulechki
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
Concept normalization of clinical texts to standard medical classifications and ontologies is a task with high importance for healthcare and medical research. We attempt to solve this problem through automatic SNOMED CT encoding, where SNOMED CT is one of the most widely used and comprehensive clinical term ontologies. Applying basic Deep Learning models, however, leads to undesirable results due to the unbalanced nature of the data and the extreme number of classes. We propose a classification procedure that features a multiple-step workflow consisting of label clustering, multi-cluster classification, and clusters-to-labels mapping. For multi-cluster classification, BioBERT is fine-tuned over our custom dataset. The clusters-to-labels mapping is carried out by a one-vs-all classifier (SVC) applied to every single cluster. We also present the steps for automatic dataset generation of textual descriptions annotated with SNOMED CT codes based on public data and linked open data. In order to cope with the problem that our dataset is highly unbalanced, some data augmentation methods are applied. The results from the conducted experiments show high accuracy and reliability of our approach for prediction of SNOMED CT codes relevant to a clinical text.
This paper addresses the task of categorizing companies within industry classification schemes. The datasets consists of encyclopedic articles about companies and their economic activities. The target classification schema is build by mapping linked open data in a semi-supervised manner. Target classes are build bottom-up from DBpedia. We apply several state of the art text classification techniques, based both on deep-learning and classical vector-space models.
Cet article présente des applications d’outils et méthodes du traitement automatique des langues (TAL) à la maîtrise du risque industriel grâce à l’analyse de données textuelles issues de volumineuses bases de retour d’expérience (REX). Il explicite d’abord le domaine de la gestion de la sûreté, ses aspects politiques et sociaux ainsi que l’activité des experts en sûreté et les besoins qu’ils expriment. Dans un deuxième temps il présente une série de techniques, comme la classification automatique de documents, le repérage de subjectivité, et le clustering, adaptées aux données REX visant à répondre à ces besoins présents et à venir, sous forme d’outils, en support à l’activité des experts.