Ritwik Banerjee


2021

pdf
An Empirical Assessment of the Qualitative Aspects of Misinformation in Health News
Chaoyuan Zuo | Qi Zhang | Ritwik Banerjee
Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda

The explosion of online health news articles runs the risk of the proliferation of low-quality information. Within the existing work on fact-checking, however, relatively little attention has been paid to medical news. We present a health news classification task to determine whether medical news articles satisfy a set of review criteria deemed important by medical experts and health care journalists. We present a dataset of 1,119 health news paired with systematic reviews. The review criteria consist of six elements that are essential to the accuracy of medical news. We then present experiments comparing the classical token-based approach with the more recent transformer-based models. Our results show that detecting qualitative lapses is a challenging task with direct ramifications in misinformation, but is an important direction to pursue beyond assigning True or False labels to short claims.

pdf
An Investigation into the Contribution of Locally Aggregated Descriptors to Figurative Language Identification
Sina Mahdipour Saravani | Ritwik Banerjee | Indrakshi Ray
Proceedings of the Second Workshop on Insights from Negative Results in NLP

In natural language understanding, topics that touch upon figurative language and pragmatics are notably difficult. We probe a novel use of locally aggregated descriptors – specifically, an architecture called NeXtVLAD – motivated by its accomplishments in computer vision, achieve tremendous success in the FigLang2020 sarcasm detection task. The reported F1 score of 93.1% is 14% higher than the next best result. We specifically investigate the extent to which the novel architecture is responsible for this boost, and find that it does not provide statistically significant benefits. Deep learning approaches are expensive, and we hope our insights highlighting the lack of benefits from introducing a resource-intensive component will aid future research to distill the effective elements from long and complex pipelines, thereby providing a boost to the wider research community.

2020

pdf
Querying Across Genres for Medical Claims in News
Chaoyuan Zuo | Narayan Acharya | Ritwik Banerjee
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We present a query-based biomedical information retrieval task across two vastly different genres – newswire and research literature – where the goal is to find the research publication that supports the primary claim made in a health-related news article. For this task, we present a new dataset of 5,034 claims from news paired with research abstracts. Our approach consists of two steps: (i) selecting the most relevant candidates from a collection of 222k research abstracts, and (ii) re-ranking this list. We compare the classical IR approach using BM25 with more recent transformer-based models. Our results show that cross-genre medical IR is a viable task, but incorporating domain-specific knowledge is crucial.

2014

pdf
Keystroke Patterns as Prosody in Digital Writings: A Case Study with Deceptive Reviews and Essays
Ritwik Banerjee | Song Feng | Jun Seok Kang | Yejin Choi
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2012

pdf
Characterizing Stylistic Elements in Syntactic Structure
Song Feng | Ritwik Banerjee | Yejin Choi
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf
Syntactic Stylometry for Deception Detection
Song Feng | Ritwik Banerjee | Yejin Choi
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)