SemEval Task 8: A Comparison of Traditional and Neural Models for Detecting Machine Authored Text

Srikar Kashyap Pulipaka, Shrirang Mhalgi, Joseph Larson, Sandra Kübler


Abstract
Since Large Language Models have reached a stage where it is becoming more and more difficult to distinguish between human and machine written text, there is an increasing need for automated systems to distinguish between them. As part of SemEval Task 8, Subtask A: Binary Human-Written vs. Machine-Generated Text Classification, we explore a variety of machine learning classifiers, from traditional statistical methods, such as Naïve Bayes and Decision Trees, to fine-tuned transformer models, suchas RoBERTa and ALBERT. Our findings show that using a fine-tuned RoBERTa model with optimizedhyperparameters yields the best accuracy. However, the improvement does not translate to the test set because of the differences in distribution in the development and test sets.
Anthology ID:
2024.semeval-1.148
Volume:
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
1026–1031
Language:
URL:
https://aclanthology.org/2024.semeval-1.148
DOI:
Bibkey:
Cite (ACL):
Srikar Kashyap Pulipaka, Shrirang Mhalgi, Joseph Larson, and Sandra Kübler. 2024. SemEval Task 8: A Comparison of Traditional and Neural Models for Detecting Machine Authored Text. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 1026–1031, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
SemEval Task 8: A Comparison of Traditional and Neural Models for Detecting Machine Authored Text (Pulipaka et al., SemEval 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jeptaln-2024-ingestion/2024.semeval-1.148.pdf
Supplementary material:
 2024.semeval-1.148.SupplementaryMaterial.zip
Supplementary material:
 2024.semeval-1.148.SupplementaryMaterial.txt