Max Fleming


2021

We describe our straight-forward approach for Tasks 5 and 6 of 2021 Social Media Min- ing for Health Applications (SMM4H) shared tasks. Our system is based on fine-tuning Dis- tillBERT on each task, as well as first fine- tuning the model on the other task. In this paper, we additionally explore how much fine- tuning is necessary for accurately classifying tweets as containing self-reported COVID-19 symptoms (Task 5) or whether a tweet related to COVID-19 is self-reporting, non-personal reporting, or a literature/news mention of the virus (Task 6).

2020

We release a dataset of over 2,100 COVID19 related Frequently asked Question-Answer pairs scraped from over 40 trusted websites. We include an additional 24, 000 questions pulled from online sources that have been aligned by experts with existing answered questions from our dataset. This paper describes our efforts in collecting the dataset and summarizes the resulting data. Our dataset is automatically updated daily and available at https://github.com/JHU-COVID-QA/ scraping-qas. So far, this data has been used to develop a chatbot providing users information about COVID-19. We encourage others to build analytics and tools upon this dataset as well.