Lin Miao
2022
An Interactive Analysis of User-reported Long COVID Symptoms using Twitter Data
Lin Miao
|
Mark Last
|
Marina Litvak
Proceedings of the 2nd Workshop on Deriving Insights from User-Generated Text
With millions of documented recoveries from COVID-19 worldwide, various long-term sequelae have been observed in a large group of survivors. This paper is aimed at systematically analyzing user-generated conversations on Twitter that are related to long-term COVID symptoms for a better understanding of the Long COVID health consequences. Using an interactive information extraction tool built especially for this purpose, we extracted key information from the relevant tweets and analyzed the user-reported Long COVID symptoms with respect to their demographic and geographical characteristics. The results of our analysis are expected to improve the public awareness on long-term COVID-19 sequelae and provide important insights to public health authorities.
2020
Detecting Troll Tweets in a Bilingual Corpus
Lin Miao
|
Mark Last
|
Marina Litvak
Proceedings of the Twelfth Language Resources and Evaluation Conference
During the past several years, a large amount of troll accounts has emerged with efforts to manipulate public opinion on social network sites. They are often involved in spreading misinformation, fake news, and propaganda with the intent of distracting and sowing discord. This paper aims to detect troll tweets in both English and Russian assuming that the tweets are generated by some “troll farm.” We reduce this task to the authorship verification problem of determining whether a single tweet is authored by a “troll farm” account or not. We evaluate a supervised classification approach with monolingual, cross-lingual, and bilingual training scenarios, using several machine learning algorithms, including deep learning. The best results are attained by the bilingual learning, showing the area under the ROC curve (AUC) of 0.875 and 0.828, for tweet classification in English and Russian test sets, respectively. It is noteworthy that these results are obtained using only raw text features, which do not require manual feature engineering efforts. In this paper, we introduce a resource of English and Russian troll tweets containing original tweets and translation from English to Russian, Russian to English. It is available for academic purposes.
Twitter Data Augmentation for Monitoring Public Opinion on COVID-19 Intervention Measures
Lin Miao
|
Mark Last
|
Marina Litvak
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020
The COVID-19 outbreak is an ongoing worldwide pandemic that was announced as a global health crisis in March 2020. Due to the enormous challenges and high stakes of this pandemic, governments have implemented a wide range of policies aimed at containing the spread of the virus and its negative effect on multiple aspects of our life. Public responses to various intervention measures imposed over time can be explored by analyzing the social media. Due to the shortage of available labeled data for this new and evolving domain, we apply data distillation methodology to labeled datasets from related tasks and a very small manually labeled dataset. Our experimental results show that data distillation outperforms other data augmentation methods on our task.
Search