Shuang Gao


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2021

pdf bib
Topic Modeling for Maternal Health Using Reddit
Shuang Gao | Shivani Pandya | Smisha Agarwal | João Sedoc
Proceedings of the 12th International Workshop on Health Text Mining and Information Analysis

This paper applies topic modeling to understand maternal health topics, concerns, and questions expressed in online communities on social networking sites. We examine Latent Dirichlet Analysis (LDA) and two state-of-the-art methods: neural topic model with knowledge distillation (KD) and Embedded Topic Model (ETM) on maternal health texts collected from Reddit. The models are evaluated on topic quality and topic inference, using both auto-evaluation metrics and human assessment. We analyze a disconnect between automatic metrics and human evaluations. While LDA performs the best overall with the auto-evaluation metrics NPMI and Coherence, Neural Topic Model with Knowledge Distillation is favorable by expert evaluation. We also create a new partially expert annotated gold-standard maternal health topic