CUNLP at SemEval-2024 Task 8: Classify Human and AI Generated Text

Pranjal Aggarwal, Deepanshu Sachdeva


Abstract
This task is a sub-part of SemEval-2024 competition which aims to classify AI vs Human Generated Text. In this paper we have experimented on an approach to automatically classify an artificially generated text and a human written text. With the advent of generative models like GPT-3.5 and GPT-4 it has become increasingly necessary to classify between the two texts due to various applications like detecting plagiarism and in tasks like fake news detection that can heavily impact real world problems, for instance stock manipulation through AI generated news articles. To achieve this, we start by using some basic models like Logistic Regression and move our way up to more complex models like transformers and GPTs for classification. This is a binary classification task where the label 1 represents AI generated text and 0 represents human generated text. The dataset was given in JSON style format which was converted to comma separated file (CSV) for better processing using the pandas library in Python as CSV files provides more readability than JSON format files. Approaches like Bagging Classifier and Voting classifier were also used.
Anthology ID:
2024.semeval-1.1
Volume:
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–6
Language:
URL:
https://aclanthology.org/2024.semeval-1.1
DOI:
Bibkey:
Cite (ACL):
Pranjal Aggarwal and Deepanshu Sachdeva. 2024. CUNLP at SemEval-2024 Task 8: Classify Human and AI Generated Text. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 1–6, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
CUNLP at SemEval-2024 Task 8: Classify Human and AI Generated Text (Aggarwal & Sachdeva, SemEval 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-checklist/2024.semeval-1.1.pdf
Supplementary material:
 2024.semeval-1.1.SupplementaryMaterial.txt