Groningen Team F at SemEval-2024 Task 8: Detecting Machine-Generated Text using Feature-Based Machine Learning Models

Rina Donker; Björn Overbeek; Dennis Thulden; Oscar Zwagers

doi:10.18653/v1/2024.semeval-1.268

Groningen Team F at SemEval-2024 Task 8: Detecting Machine-Generated Text using Feature-Based Machine Learning Models

Rina Donker, Björn Overbeek, Dennis Thulden, Oscar Zwagers

Abstract

Large language models (LLMs) have shown remarkable capability of creating fluent responses to a wide variety of user queries. However, this also comes with concerns regarding the spread of misinformation and potential misuse within educational context. In this paper we describe our contribution to SemEval-2024 Task 8 (Wang et al., 2024), a shared task created around detecting machine-generated text. We aim to create several feature-based models that can detect whether a text is machine-generated or human-written. In the end, we obtained an accuracy of 0.74 on the binary human-written vs. machine-generated text classification task (Subtask A monolingual) and an accuracy of 0.61 on the multi-way machine-generated text-classification task (Subtask B). For future work, more features and models could be implemented.

Anthology ID:: 2024.semeval-1.268
Volume:: Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
Venue:: SemEval
SIG:: SIGLEX
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1919–1925
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.semeval-1.268/
DOI:: 10.18653/v1/2024.semeval-1.268
Bibkey:
Cite (ACL):: Rina Donker, Björn Overbeek, Dennis Thulden, and Oscar Zwagers. 2024. Groningen Team F at SemEval-2024 Task 8: Detecting Machine-Generated Text using Feature-Based Machine Learning Models. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 1919–1925, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: Groningen Team F at SemEval-2024 Task 8: Detecting Machine-Generated Text using Feature-Based Machine Learning Models (Donker et al., SemEval 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.semeval-1.268.pdf
Supplementarymaterial:: 2024.semeval-1.268.SupplementaryMaterial.zip
Supplementarymaterial:: 2024.semeval-1.268.SupplementaryMaterial.txt

PDF Cite Search Supplementarymaterial Supplementarymaterial Fix data