Eulàlia Farré-Maduell

Also published as: Eulalia Farre-Maduell


The SocialDisNER shared task on detection of disease mentions in health-relevant content from social media: methods, evaluation, guidelines and corpora
Luis Gasco Sánchez | Darryl Estrada Zavala | Eulàlia Farré-Maduell | Salvador Lima-López | Antonio Miranda-Escalada | Martin Krallinger
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task

There is a pressing need to exploit health-related content from social media, a global source of data where key health information is posted directly by citizens, patients and other healthcare stakeholders. Use cases of disease related social media mining include disease outbreak/surveillance, mental health and pharmacovigilance. Current efforts address the exploitation of social media beyond English. The SocialDisNER task, organized as part of the SMM4H 2022 initiative, has applied the LINKAGE methodology to select and annotate a Gold Standard corpus of 9,500 tweets in Spanish enriched with disease mentions generated by patients and medical professionals. As a complementary resource for teams participating in the SocialDisNER track, we have also created a large-scale corpus of 85,000 tweets, where in addition to disease mentions, other medical entities of relevance (e.g., medications, symptoms and procedures, among others) have been automatically labelled. Using these large-scale datasets, co-mention networks or knowledge graphs were released for each entity pair type. Out of the 47 teams registered for the task, 17 teams uploaded a total of 32 runs. The top-performing team achieved a very competitive 0.891 f-score, with a system trained following a continue pre-training strategy. We anticipate that the corpus and systems resulting from the SocialDisNER track might further foster health related text mining of social media content in Spanish and inspire disease detection strategies in other languages.


pdf bib
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task
Arjun Magge | Ari Klein | Antonio Miranda-Escalada | Mohammed Ali Al-garadi | Ilseyar Alimova | Zulfat Miftahutdinov | Eulalia Farre-Maduell | Salvador Lima Lopez | Ivan Flores | Karen O'Connor | Davy Weissenbacher | Elena Tutubalina | Abeed Sarker | Juan M Banda | Martin Krallinger | Graciela Gonzalez-Hernandez
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task

The ProfNER shared task on automatic recognition of occupation mentions in social media: systems, evaluation, guidelines, embeddings and corpora
Antonio Miranda-Escalada | Eulàlia Farré-Maduell | Salvador Lima-López | Luis Gascó | Vicent Briva-Iglesias | Marvin Agüero-Torales | Martin Krallinger
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task

Detection of occupations in texts is relevant for a range of important application scenarios, like competitive intelligence, sociodemographic analysis, legal NLP or health-related occupational data mining. Despite the importance and heterogeneous data types that mention occupations, text mining efforts to recognize them have been limited. This is due to the lack of clear annotation guidelines and high-quality Gold Standard corpora. Social media data can be regarded as a relevant source of information for real-time monitoring of at-risk occupational groups in the context of pandemics like the COVID-19 one, facilitating intervention strategies for occupations in direct contact with infectious agents or affected by mental health issues. To evaluate current NLP methods and to generate resources, we have organized the ProfNER track at SMM4H 2021, providing ProfNER participants with a Gold Standard corpus of manually annotated tweets (human IAA of 0.919) following annotation guidelines available in Spanish and English, an occupation gazetteer, a machine-translated version of tweets, and FastText embeddings. Out of 35 registered teams, 11 submitted a total of 27 runs. Best-performing participants built systems based on recent NLP technologies (e.g. transformers) and achieved 0.93 F-score in Text Classification and 0.839 in Named Entity Recognition. Corpus: