Haichao Shi


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
Generate First, Then Sample: Enhancing Fake News Detection with LLM-Augmented Reinforced Sampling
Zhao Tong | Yimeng Gu | Huidong Liu | Qiang Liu | Shu Wu | Haichao Shi | Xiao-Yu Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The spread of fake news on online platforms has long been a pressing concern. Considering this, extensive efforts have been made to develop fake news detectors. However, a major drawback of these models is their relatively low performance—lagging by more than 20%—in identifying fake news compared to real news, making them less suitable for practical deployment. This gap is likely due to an imbalance in the dataset and the model’s inadequate understanding of data distribution on the targeted platform. In this work, we focus on improving the model’s effectiveness in detecting fake news. To achieve this, we first adopt an LLM to generate fake news in three different styles, which are later incorporated into the training set to augment the representation of fake news. Then, we apply Reinforcement Learning to dynamically sample fake news, allowing the model to learn the optimal real-to-fake news ratio for training an effective fake news detector on the targeted platform. This approach allows our model to perform effectively even with a limited amount of annotated news data and consistently improve detection accuracy across different platforms. Experimental results demonstrate that our approach achieves state-of-the-art performance on two benchmark datasets, improving fake news detection performance by 24.02% and 11.06% respectively.