DUTh at SemEval 2024 Task 8: Comparing classic Machine Learning Algorithms and LLM based methods for Multigenerator, Multidomain and Multilingual Machine-Generated Text Detection

Theodora Kyriakou; Ioannis Maslaris; Avi Arampatzis

doi:10.18653/v1/2024.semeval-1.156

DUTh at SemEval 2024 Task 8: Comparing classic Machine Learning Algorithms and LLM based methods for Multigenerator, Multidomain and Multilingual Machine-Generated Text Detection

Theodora Kyriakou, Ioannis Maslaris, Avi Arampatzis

Abstract

Text-generative models evolve rapidly nowadays. Although, they are very useful tools for a lot of people, they have also raised concerns for different reasons. This paper presents our work for SemEval2024 Task-8 on 2 out of the 3 subtasks. This shared task aims at finding automatic models for making AI vs. human written text classification easier. Our team, after trying different preprocessing, several Machine Learning algorithms, and some LLMs, ended up with mBERT, XLM-RoBERTa, and BERT for the tasks we submitted. We present both positive and negative methods, so that future researchers are informed about what works and what doesn’t.

Anthology ID:: 2024.semeval-1.156
Volume:: Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
Venue:: SemEval
SIG:: SIGLEX
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1080–1086
Language:
URL:: https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2024.semeval-1.156/
DOI:: 10.18653/v1/2024.semeval-1.156
Bibkey:
Cite (ACL):: Theodora Kyriakou, Ioannis Maslaris, and Avi Arampatzis. 2024. DUTh at SemEval 2024 Task 8: Comparing classic Machine Learning Algorithms and LLM based methods for Multigenerator, Multidomain and Multilingual Machine-Generated Text Detection. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 1080–1086, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: DUTh at SemEval 2024 Task 8: Comparing classic Machine Learning Algorithms and LLM based methods for Multigenerator, Multidomain and Multilingual Machine-Generated Text Detection (Kyriakou et al., SemEval 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2024.semeval-1.156.pdf
Supplementarymaterial:: 2024.semeval-1.156.SupplementaryMaterial.txt

PDF Cite Search Supplementarymaterial Fix data