Huidong Liu

2025

pdf bib abs
Generate First, Then Sample: Enhancing Fake News Detection with LLM-Augmented Reinforced Sampling
Zhao Tong | Yimeng Gu | Huidong Liu | Qiang Liu | Shu Wu | Haichao Shi | Xiao-Yu Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The spread of fake news on online platforms has long been a pressing concern. Considering this, extensive efforts have been made to develop fake news detectors. However, a major drawback of these models is their relatively low performance—lagging by more than 20%—in identifying *fake* news compared to *real* news, making them less suitable for practical deployment. This gap is likely due to an imbalance in the dataset and the model’s inadequate understanding of data distribution on the targeted platform. In this work, we focus on improving the model’s effectiveness in detecting *fake* news. To achieve this, we **first** adopt an LLM to **generate** fake news in three different styles, which are later incorporated into the training set to augment the representation of fake news. **Then**, we apply Reinforcement Learning to dynamically **sample** fake news, allowing the model to learn the optimal real-to-fake news ratio for training an effective fake news detector on the targeted platform. This approach allows our model to perform effectively even with a limited amount of annotated news data and consistently improve detection accuracy across different platforms. Experimental results demonstrate that our approach achieves state-of-the-art performance on two benchmark datasets, improving *fake* news detection performance by 24.02% and 11.06% respectively.

2023

Various Vision-Language Pre-training (VLP) models (e.g., CLIP, BLIP) have sprung up and dramatically advanced the benchmarks for public general-domain datasets (e.g., COCO, Flickr30k). Such models usually learn the cross-modal alignment from large-scale well-aligned image-text datasets without leveraging external knowledge. Adapting these models to downstream applications in specific domains like fashion requires fine-grained in-domain image-text corpus, which are usually less semantically aligned and in small scale that requires efficient pre-training strategies. In this paper, we propose a knowledge-guided fashion-domain language-image pre-training (FLIP) framework that focuses on learning fine-grained representations in e-commerce domain and utilizes external knowledge (i.e., product attribute schema), to improve the pre-training efficiency. Experiments demonstrate that FLIP outperforms previous state-of-the-art VLP models on Amazon data and on the Fashion-Gen dataset by large margins. FLIP has been successfully deployed in the Amazon catalog system to backfill missing attributes and improve the customer shopping experience.

Co-authors

Shu Wu 1

Venues

acl2

Fix author