Leveraging Large Language Models for Fact Verification in Italian

Antonio Scaiella, Stefano Costanzo, Elisa Passone, Danilo Croce, Giorgio Gambosi


Abstract
In recent years, Automatic Fact Checking has become a crucial tool in combating fake news, leveraging AI to verify the accuracy of information. Despite significant advancements, most datasets and models are predominantly available in English, posing challenges for other languages. This paper presents an Italian resource based on the dataset made available in the FEVER evaluation campaign, created to train and evaluate fact-checking models in Italian. The dataset comprises approximately 240k examples, with over 2k test examples manually validated. Additionally, we fine-tuned a state-of-the-art LLM, namely LLaMA3, on both the original English and translated Italian datasets, demonstrating that fine-tuning significantly improves model performance. Our results suggest that the fine-tuned models achieve comparable accuracy in both languages, highlighting the value of the proposed resource.
Anthology ID:
2024.clicit-1.97
Volume:
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
Month:
December
Year:
2024
Address:
Pisa, Italy
Editors:
Felice Dell'Orletta, Alessandro Lenci, Simonetta Montemagni, Rachele Sprugnoli
Venue:
CLiC-it
SIG:
Publisher:
CEUR Workshop Proceedings
Note:
Pages:
898–908
Language:
URL:
https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.clicit-1.97/
DOI:
Bibkey:
Cite (ACL):
Antonio Scaiella, Stefano Costanzo, Elisa Passone, Danilo Croce, and Giorgio Gambosi. 2024. Leveraging Large Language Models for Fact Verification in Italian. In Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), pages 898–908, Pisa, Italy. CEUR Workshop Proceedings.
Cite (Informal):
Leveraging Large Language Models for Fact Verification in Italian (Scaiella et al., CLiC-it 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.clicit-1.97.pdf