Juan Cao


2022

pdf
Zoom Out and Observe: News Environment Perception for Fake News Detection
Qiang Sheng | Juan Cao | Xueyao Zhang | Rundong Li | Danding Wang | Yongchun Zhu
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Fake news detection is crucial for preventing the dissemination of misinformation on social media. To differentiate fake news from real ones, existing methods observe the language patterns of the news post and “zoom in” to verify its content with knowledge sources or check its readers’ replies. However, these methods neglect the information in the external news environment where a fake news post is created and disseminated. The news environment represents recent mainstream media opinion and public attention, which is an important inspiration of fake news fabrication because fake news is often designed to ride the wave of popular events and catch public attention with unexpected novel content for greater exposure and spread. To capture the environmental signals of news posts, we “zoom out” to observe the news environment and propose the News Environment Perception Framework (NEP). For each post, we construct its macro and micro news environment from recent mainstream news. Then we design a popularity-oriented and a novelty-oriented module to perceive useful signals and further assist final prediction. Experiments on our newly built datasets show that the NEP can efficiently improve the performance of basic fake news detectors.

pdf
Improving Fake News Detection of Influential Domain via Domain- and Instance-Level Transfer
Qiong Nan | Danding Wang | Yongchun Zhu | Qiang Sheng | Yuhui Shi | Juan Cao | Jintao Li
Proceedings of the 29th International Conference on Computational Linguistics

Social media spreads both real news and fake news in various domains including politics, health, entertainment, etc. It is crucial to automatically detect fake news, especially for news of influential domains like politics and health because they may lead to serious social impact, e.g., panic in the COVID-19 pandemic. Some studies indicate the correlation between domains and perform multi-domain fake news detection. However, these multi-domain methods suffer from a seesaw problem that the performance of some domains is often improved by hurting the performance of other domains, which could lead to an unsatisfying performance in the specific target domains. To address this issue, we propose a Domain- and Instance-level Transfer Framework for Fake News Detection (DITFEND), which could improve the performance of specific target domains. To transfer coarse-grained domain-level knowledge, we train a general model with data of all domains from the meta-learning perspective. To transfer fine-grained instance-level knowledge and adapt the general model to a target domain, a language model is trained on the target domain to evaluate the transferability of each data instance in source domains and re-weight the instance’s contribution. Experiments on two real-world datasets demonstrate the effectiveness of DITFEND. According to both offline and online experiments, the DITFEND shows superior effectiveness for fake news detection.

2021

pdf
Article Reranking by Memory-Enhanced Key Sentence Matching for Detecting Previously Fact-Checked Claims
Qiang Sheng | Juan Cao | Xueyao Zhang | Xirong Li | Lei Zhong
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

False claims that have been previously fact-checked can still spread on social media. To mitigate their continual spread, detecting previously fact-checked claims is indispensable. Given a claim, existing works focus on providing evidence for detection by reranking candidate fact-checking articles (FC-articles) retrieved by BM25. However, these performances may be limited because they ignore the following characteristics of FC-articles: (1) claims are often quoted to describe the checked events, providing lexical information besides semantics; (2) sentence templates to introduce or debunk claims are common across articles, providing pattern information. Models that ignore the two aspects only leverage semantic relevance and may be misled by sentences that describe similar but irrelevant events. In this paper, we propose a novel reranker, MTM (Memory-enhanced Transformers for Matching) to rank FC-articles using key sentences selected with event (lexical and semantic) and pattern information. For event information, we propose a ROUGE-guided Transformer which is finetuned with regression of ROUGE. For pattern information, we generate pattern vectors for matching with sentences. By fusing event and pattern information, we select key sentences to represent an article and then predict if the article fact-checks the given claim using the claim, key sentences, and patterns. Experiments on two real-world datasets show that MTM outperforms existing methods. Human evaluation proves that MTM can capture key sentences for explanations.

2020

pdf
Integrating Semantic and Structural Information with Graph Convolutional Network for Controversy Detection
Lei Zhong | Juan Cao | Qiang Sheng | Junbo Guo | Ziang Wang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Identifying controversial posts on social media is a fundamental task for mining public sentiment, assessing the influence of events, and alleviating the polarized views. However, existing methods fail to 1) effectively incorporate the semantic information from content-related posts; 2) preserve the structural information for reply relationship modeling; 3) properly handle posts from topics dissimilar to those in the training set. To overcome the first two limitations, we propose Topic-Post-Comment Graph Convolutional Network (TPC-GCN), which integrates the information from the graph structure and content of topics, posts, and comments for post-level controversy detection. As to the third limitation, we extend our model to Disentangled TPC-GCN (DTPC-GCN), to disentangle topic-related and topic-unrelated features and then fuse dynamically. Extensive experiments on two real-world datasets demonstrate that our models outperform existing methods. Analysis of the results and cases proves that our models can integrate both semantic and structural information with significant generalizability.