How to Generalize the Detection of AI-Generated Text: Confounding Neurons

Claudio Borile; Carlo Abrate

doi:10.18653/v1/2025.findings-emnlp.1388

How to Generalize the Detection of AI-Generated Text: Confounding Neurons

Abstract

Detectors of LLM-generated text suffer from poor domain shifts generalization ability. Yet, reliable text detection methods in the wild are of paramount importance for plagiarism detection, integrity of the public discourse, and AI safety. Linguistic and domain confounders introduce spurious correlations, leading to poor out-of-distribution (OOD) performance. In this work we introduce the concept of confounding neurons, individual neurons within transformers-based detectors that encode dataset-specific biases rather than task-specific signals. Leveraging confounding neurons, we propose a novel post-hoc, neuron-level intervention framework to disentangle AI-generated text detection factors from data-specific biases. Through extensive experiments we prove its ability to effectively reduce topic-specific biases, enhancing the model’s ability to generalize across domains.

Anthology ID:: 2025.findings-emnlp.1388
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 25461–25476
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1388/
DOI:: 10.18653/v1/2025.findings-emnlp.1388
Bibkey:
Cite (ACL):: Claudio Borile and Carlo Abrate. 2025. How to Generalize the Detection of AI-Generated Text: Confounding Neurons. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 25461–25476, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: How to Generalize the Detection of AI-Generated Text: Confounding Neurons (Borile & Abrate, Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1388.pdf
Checklist:: 2025.findings-emnlp.1388.checklist.pdf

PDF Cite Search Checklist Fix data