Sergio Santamaría Carrasco


2021

pdf
Word Embeddings, Cosine Similarity and Deep Learning for Identification of Professions & Occupations in Health-related Social Media
Sergio Santamaría Carrasco | Roberto Cuervo Rosillo
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task

ProfNER-ST focuses on the recognition of professions and occupations from Twitter using Spanish data. Our participation is based on a combination of word-level embeddings, including pre-trained Spanish BERT, as well as cosine similarity computed over a subset of entities that serve as input for an encoder-decoder architecture with attention mechanism. Finally, our best score achieved an F1-measure of 0.823 in the official test set.