SENTRA: Selected-Next-Token Transformer for LLM Text Detection

Mitchell Plyler, Yilun Zhang, Alexander Tuzhilin, Saoud Khalifah, Sen Tian


Abstract
LLMs are becoming increasingly capable and widespread. Consequently, the potential and reality of their misuse is also growing. In this work, we address the problem of detecting LLM-generated text that is not explicitly declared as such. We present a novel, general-purpose, and supervised LLM text detector, SElected-Next-Token tRAnsformer (SENTRA). SENTRA is a Transformer-based encoder leveraging selected-next-token-probability sequences and utilizing contrastive pre-training on large amounts of unlabeled data. Our experiments on three popular public datasets across 24 domains of text demonstrate SENTRA is a general-purpose classifier that significantly outperforms popular baselines in the out-of-domain setting.
Anthology ID:
2025.findings-emnlp.1004
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
18499–18516
Language:
URL:
https://preview.aclanthology.org/ingest-luhme/2025.findings-emnlp.1004/
DOI:
10.18653/v1/2025.findings-emnlp.1004
Bibkey:
Cite (ACL):
Mitchell Plyler, Yilun Zhang, Alexander Tuzhilin, Saoud Khalifah, and Sen Tian. 2025. SENTRA: Selected-Next-Token Transformer for LLM Text Detection. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 18499–18516, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
SENTRA: Selected-Next-Token Transformer for LLM Text Detection (Plyler et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-luhme/2025.findings-emnlp.1004.pdf
Checklist:
 2025.findings-emnlp.1004.checklist.pdf