Haiqi Zhang

2025

pdf bib abs
RATSD: Retrieval Augmented Truthfulness Stance Detection from Social Media Posts Toward Factual Claims
Zhengyuan Zhu | Zeyu Zhang | Haiqi Zhang | Chengkai Li
Findings of the Association for Computational Linguistics: NAACL 2025

Social media provides a valuable lens for assessing public perceptions and opinions. This paper focuses on the concept of truthfulness stance, which evaluates whether a textual utterance affirms, disputes, or remains neutral or indifferent toward a factual claim. Our systematic analysis fills a gap in the existing literature by offering the first in-depth conceptual framework encompassing various definitions of stance. We introduce RATSD (Retrieval Augmented Truthfulness Stance Detection), a novel method that leverages large language models (LLMs) with retrieval-augmented generation (RAG) to enhance the contextual understanding of tweets in relation to claims. RATSD is evaluated on TSD-CT, our newly developed dataset containing 3,105 claim-tweet pairs, along with existing benchmark datasets. Our experiment results demonstrate that RATSD outperforms state-of-the-art methods, achieving a significant increase in Macro-F1 score on TSD-CT. Our contributions establish a foundation for advancing research in misinformation analysis and provide valuable tools for understanding public perceptions in digital discourse.

pdf bib abs
LLMTaxo: Leveraging Large Language Models for Constructing Taxonomy of Factual Claims from Social Media
Haiqi Zhang | Zhengyuan Zhu | Zeyu Zhang | Chengkai Li
Findings of the Association for Computational Linguistics: ACL 2025

With the rapid expansion of content on social media platforms, analyzing and comprehending online discourse has become increasingly complex. This paper introduces LLMTaxo, a novel framework leveraging large language models for the automated construction of taxonomies of factual claims from social media by generating topics at multiple levels of granularity. The resulting hierarchical structure significantly reduces redundancy and improves information accessibility. We also propose dedicated taxonomy evaluation metrics to enable comprehensive assessment. Evaluations conducted on three diverse datasets demonstrate LLMTaxo’s effectiveness in producing clear, coherent, and comprehensive taxonomies. Among the evaluated models, GPT-4o mini consistently outperforms others across most metrics. The framework’s flexibility and low reliance on manual intervention underscore its potential for broad applicability.

2024

pdf bib abs
Granular Analysis of Social Media Users’ Truthfulness Stances Toward Climate Change Factual Claims
Haiqi Zhang | Zhengyuan Zhu | Zeyu Zhang | Jacob Devasier | Chengkai Li
Proceedings of the 1st Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2024)

Climate change poses an urgent global problem that requires efficient data analysis mechanisms to provide insights into climate-related discussions on social media platforms. This paper presents a framework aimed at understanding social media users’ perceptions of various climate change topics and uncovering the insights behind these perceptions. Our framework employs large language model to develop a taxonomy of factual claims related to climate change and build a classification model that detects the truthfulness stance of tweets toward the factual claims. The findings reveal two key conclusions: (1) The public tends to believe the claims are true, regardless of the actual claim veracity; (2) The public shows a lack of discernment between facts and misinformation across different topics, particularly in areas related to politics, economy, and environment.

2020

Just as SARS-CoV-2, a new form of coronavirus continues to infect a growing number of people around the world, harmful misinformation about the outbreak also continues to spread. With the goal of combating misinformation, we designed and built Jennifer–a chatbot maintained by a global group of volunteers. With Jennifer, we hope to learn whether public information from reputable sources could be more effectively organized and shared in the wake of a crisis as well as to understand issues that the public were most immediately curious about. In this paper, we introduce Jennifer and describe the design of this proof-of-principle system. We also present lessons learned and discuss open challenges. Finally, to facilitate future research, we release COVID-19 Question Bank, a dataset of 3,924 COVID-19-related questions in 944 groups, gathered from our users and volunteers.

Co-authors

Venues

Fix data