Majid Zarharan
2024
Tell Me Why: Explainable Public Health Fact-Checking with Large Language Models
Majid Zarharan
|
Pascal Wullschleger
|
Babak Behkam Kia
|
Mohammad Taher Pilehvar
|
Jennifer Foster
Proceedings of the 4th Workshop on Trustworthy Natural Language Processing (TrustNLP 2024)
This paper presents a comprehensive analysis of explainable fact-checking through a series of experiments, focusing on the ability of large language models to verify public health claims and provide explanations or justifications for their veracity assessments. We examine the effectiveness of zero/few-shot prompting and parameter-efficient fine-tuning across various open and closed-source models, examining their performance in both isolated and joint tasks of veracity prediction and explanation generation. Importantly, we employ a dual evaluation approach comprising previously established automatic metrics and a novel set of criteria through human evaluation. Our automatic evaluation indicates that, within the zero-shot scenario, GPT-4 emerges as the standout performer, but in few-shot and parameter-efficient fine-tuning contexts, open-source models demonstrate their capacity to not only bridge the performance gap but, in some instances, surpass GPT-4. Human evaluation reveals yet more nuance as well as indicating potential problems with the gold explanations.
2021
ParsFEVER: a Dataset for Farsi Fact Extraction and Verification
Majid Zarharan
|
Mahsa Ghaderan
|
Amin Pourdabiri
|
Zahra Sayedi
|
Behrouz Minaei-Bidgoli
|
Sauleh Eetemadi
|
Mohammad Taher Pilehvar
Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics
Training and evaluation of automatic fact extraction and verification techniques require large amounts of annotated data which might not be available for low-resource languages. This paper presents ParsFEVER: the first publicly available Farsi dataset for fact extraction and verification. We adopt the construction procedure of the standard English dataset for the task, i.e., FEVER, and improve it for the case of low-resource languages. Specifically, claims are extracted from sentences that are carefully selected to be more informative. The dataset comprises nearly 23K manually-annotated claims. Over 65% of the claims in ParsFEVER are many-hop (require evidence from multiple sources), making the dataset a challenging benchmark (only 13% of the claims in FEVER are many-hop). Also, despite having a smaller training set (around one-ninth of that in Fever), a model trained on ParsFEVER attains similar downstream performance, indicating the quality of the dataset. We release the dataset and the annotation guidelines at https://github.com/Zarharan/ParsFEVER.
Search