CLaCLab at SocialDisNER: Using Medical Gazetteers for Named-Entity Recognition of Disease Mentions in Spanish Tweets

Harsh Verma, Parsa Bagherzadeh, Sabine Bergler


Abstract
This paper summarizes the CLaC submission for SMM4H 2022 Task 10 which concerns the recognition of diseases mentioned in Spanish tweets. Before classifying each token, we encode each token with a transformer encoder using features from Multilingual RoBERTa Large, UMLS gazetteer, and DISTEMIST gazetteer, among others. We obtain a strict F1 score of 0.869, with competition mean of 0.675, standard deviation of 0.245, and median of 0.761.
Anthology ID:
2022.smm4h-1.16
Volume:
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
SMM4H
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
55–57
Language:
URL:
https://aclanthology.org/2022.smm4h-1.16
DOI:
Bibkey:
Cite (ACL):
Harsh Verma, Parsa Bagherzadeh, and Sabine Bergler. 2022. CLaCLab at SocialDisNER: Using Medical Gazetteers for Named-Entity Recognition of Disease Mentions in Spanish Tweets. In Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task, pages 55–57, Gyeongju, Republic of Korea. Association for Computational Linguistics.
Cite (Informal):
CLaCLab at SocialDisNER: Using Medical Gazetteers for Named-Entity Recognition of Disease Mentions in Spanish Tweets (Verma et al., SMM4H 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/starsem-semeval-split/2022.smm4h-1.16.pdf
Code
 harshshredding/smm4h-2022-social-dis-ner-submission