SubmissionNumber#=%=#174
FinalPaperTitle#=%=#PetKaz at SemEval-2024 Task 8: Can Linguistics Capture the Specifics of LLM-generated Text?
ShortPaperTitle#=%=#
NumberOfPages#=%=#8
CopyrightSigned#=%=#Kseniia
JobTitle#==#
Organization#==#
Abstract#==#In this paper, we present our submission to the SemEval-2024 Task 8 "Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection'', focusing on the detection of machine-generated texts (MGTs) in English. Specifically, our approach relies on combining embeddings from the RoBERTa-base with diversity features and uses a resampled training set. We score 16th from 139 in the ranking for Subtask A, and our results show that our approach is generalizable across unseen models and domains, achieving an accuracy of 0.91.
Author{1}{Firstname}#=%=#Kseniia
Author{1}{Lastname}#=%=#Petukhova
Author{1}{Username}#=%=#kpetyxova
Author{1}{Email}#=%=#kapetukhova@gmail.com
Author{1}{Affiliation}#=%=#MBZUAI
Author{2}{Firstname}#=%=#Roman
Author{2}{Lastname}#=%=#Kazakov
Author{2}{Username}#=%=#sachertort
Author{2}{Email}#=%=#romankazakov.krm@gmail.com
Author{2}{Affiliation}#=%=#Mohamed bin Zayed University of Artificial Intelligence
Author{3}{Firstname}#=%=#Ekaterina
Author{3}{Lastname}#=%=#Kochmar
Author{3}{Username}#=%=#ekochmar
Author{3}{Email}#=%=#ekaterina.kochmar@gmail.com
Author{3}{Affiliation}#=%=#MBZUAI

==========
èéáğö