Srijani Debnath
2024
Human vs Machine: An Automated Machine-Generated Text Detection Approach
Urwah Jawaid
|
Rudra Roy
|
Pritam Pal
|
Srijani Debnath
|
Dipankar Das
|
Sivaji Bandyopadhyay
Proceedings of the 21st International Conference on Natural Language Processing (ICON)
With the advancement of natural language processing (NLP) and sophisticated Large Language Models (LLMs), distinguishing between human-written texts and machine-generated texts is quite difficult nowadays. This paper presents a systematic approach to classifying machine-generated text from human-written text with a combination of the transformer-based model and textual feature-based post-processing technique. We extracted five textual features: readability score, stop word score, spelling and grammatical error count, unique word score and human phrase count from both human-written and machine-generated texts separately and trained three machine learning models (SVM, Random Forest and XGBoost) with these scores. Along with exploring traditional machine-learning models, we explored the BiLSTM and transformer-based distilBERT models to enhance the classification performance. By training and evaluating with a large dataset containing both human-written and machine-generated text, our best-performing framework achieves an accuracy of 87.5%.