Chaoyuan Zuo


2021

pdf
An Empirical Assessment of the Qualitative Aspects of Misinformation in Health News
Chaoyuan Zuo | Qi Zhang | Ritwik Banerjee
Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda

The explosion of online health news articles runs the risk of the proliferation of low-quality information. Within the existing work on fact-checking, however, relatively little attention has been paid to medical news. We present a health news classification task to determine whether medical news articles satisfy a set of review criteria deemed important by medical experts and health care journalists. We present a dataset of 1,119 health news paired with systematic reviews. The review criteria consist of six elements that are essential to the accuracy of medical news. We then present experiments comparing the classical token-based approach with the more recent transformer-based models. Our results show that detecting qualitative lapses is a challenging task with direct ramifications in misinformation, but is an important direction to pursue beyond assigning True or False labels to short claims.

2020

pdf
Querying Across Genres for Medical Claims in News
Chaoyuan Zuo | Narayan Acharya | Ritwik Banerjee
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We present a query-based biomedical information retrieval task across two vastly different genres – newswire and research literature – where the goal is to find the research publication that supports the primary claim made in a health-related news article. For this task, we present a new dataset of 5,034 claims from news paired with research abstracts. Our approach consists of two steps: (i) selecting the most relevant candidates from a collection of 222k research abstracts, and (ii) re-ranking this list. We compare the classical IR approach using BM25 with more recent transformer-based models. Our results show that cross-genre medical IR is a viable task, but incorporating domain-specific knowledge is crucial.