On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs

Herun Wan, Minnan Luo, Zhixiong Su, Guang Dai, Xiang Zhao


Abstract
Evidence-enhanced detectors present remarkable abilities in identifying malicious social text. However, the rise of large language models (LLMs) brings potential risks of evidence pollution to confuse detectors. This paper explores potential manipulation scenarios including basic pollution, and rephrasing or generating evidence by LLMs. To mitigate the negative impact, we propose three defense strategies from the data and model sides, including machine-generated text detection, a mixture of experts, and parameter updating. Extensive experiments on four malicious social text detection tasks with ten datasets illustrate that evidence pollution significantly compromises detectors, where the generating strategy causes up to a 14.4% performance drop. Meanwhile, the defense strategies could mitigate evidence pollution, but they faced limitations for practical employment. Further analysis illustrates that polluted evidence (i) is of high quality, evaluated by metrics and humans; (ii) would compromise the model calibration, increasing expected calibration error up to 21.6%; and (iii) could be integrated to amplify the negative impact, especially for encoder-based LMs, where the accuracy drops by 21.8%.
Anthology ID:
2025.acl-long.480
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9731–9761
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.480/
DOI:
Bibkey:
Cite (ACL):
Herun Wan, Minnan Luo, Zhixiong Su, Guang Dai, and Xiang Zhao. 2025. On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9731–9761, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs (Wan et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.480.pdf