Abstract
This paper investigates the potential benefits of language-specific fact-checking models, focusing on the case of Chinese using CHEF dataset. To better reflect real-world fact-checking, we first develop a novel Chinese document-level evidence retriever, achieving state-of-the-art performance. We then demonstrate the limitations of translation-based methods and multilingual language models, highlighting the need for language-specific systems. To better analyze token-level biases in different systems, we construct an adversarial dataset based on the CHEF dataset, where each instance has a large word overlap with the original one but holds the opposite veracity label. Experimental results on the CHEF dataset and our adversarial dataset show that our proposed method outperforms translation-based methods and multilingual language models and is more robust toward biases, emphasizing the importance of language-specific fact-checking systems.- Anthology ID:
- 2024.emnlp-main.113
- Volume:
- Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1899–1914
- Language:
- URL:
- https://aclanthology.org/2024.emnlp-main.113
- DOI:
- 10.18653/v1/2024.emnlp-main.113
- Cite (ACL):
- Caiqi Zhang, Zhijiang Guo, and Andreas Vlachos. 2024. Do We Need Language-Specific Fact-Checking Models? The Case of Chinese. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 1899–1914, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- Do We Need Language-Specific Fact-Checking Models? The Case of Chinese (Zhang et al., EMNLP 2024)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2024.emnlp-main.113.pdf